Hey everybody,

thanks a lot for reading and giving feedback!! I'll try and answer all
points that I found going through the thread in this mail, but if I miss
something please feel free to let me know! I've added a running number to
the discussed topics for ease of reference down the road.

I'll go through the KIP and update it with everything that I have written
below after sending this mail.

@Tom:
(1) If I understand your concerns correctly you feel that this
functionality would have a hard time getting approved into Apache Kafka
because it can be achieved with custom Serializers in the same way and that
we should maybe develop this outside of Apache Kafka at first.
I feel like it is precisely the fact that this is not part of core Apache
Kafka that makes people think twice about doing end-to-end encryption. I
may be working in a market (Germany) that is a bit special when compared to
the rest of the world where encryption and things like that are concerned,
but I've personally sat in multiple meetings where this feature was
discussed. It is not necessarily the end-to-end encryption itself, but the
at-rest encryption that you get with it.
When people hear that this is not part of Apache Kafka itself, but that
would need to develop something themselves that more often than not is the
end of that discussion. Using something that is not "stock" is quite often
simply not an option.
Even if they decide to go forward with it, they'll find Hendrik's blog post
from 4 years ago on this, probably the Whitepapers from Confluent and
Lenses and maybe a few implementations on github - all of which just serve
to further muddy the waters. Not because any of these resources are bad or
wrong, but just because information and implementations are spread out over
a lot of different places. Developing this outside of Apache Kafka would
simply serve to add one more item to this list that would not really matter
I'm afraid.

I strongly feel that this is a needed feature in Kafka and that there is a
large number of people out there that would want to use it - but I may very
well be mistaken, responses to this thread have not exactly been plentiful
this last year and a half..

@Mike:
(2) Regarding the encryption of headers, my current idea is to keep this
configurable. I have seen customers use headers for stuff like account
numbers which under the GDPR are considered to be personal data that should
be encrypted wherever possible. So in some instances it might be useful to
encrypt header fields as well.
My current PoC implementation allows specifying a Regex for headers that
should be encrypted, which would allow having encrypted and unencrypted
headers in the same record to hopefully suit most use cases.

(3) Also, my plan is to not change the message format, but to
"encrypt-in-place" and add a header field with the necessary information
for decryption, which would then be removed by the decrypting consumer.
There may be some out-of-date intentions still in the KIP, I'll go through
it and update.

@Ryanne:
First off, I fully agree that we should avoid painting ourselves into a
corner with an early client-only implementation. I scaled down this Kip
from earlier attempts that included things like key rollover and
broker-side implementations because I could not get any feedback from the
community on those for a long time and felt that maybe there was no
appetite for the full-blown solution. So I decided to try with a more
limited scope. I am very happy to discuss/go for the fully featured version
again :)

(4) Regarding plaintext data in RocksDB instances, I am a bit torn to be
honest. On the one hand, I feel like this scenario is not something that we
can fully control. Kafka Streams in this case is a client that takes data
from Kafka, decrypts it and then puts it somewhere in plaintext. To me this
scenario differs only slightly from for example someone writing a backup
job that reads a topic and writes it to a textfile - not much we can do
about it.
That being said, Kafka Streams is part of Apache Kafka, so does merit
special consideration. I'll have to dig into how StateStores are used a bit
(I am not the worlds largest expert - or any kind of expert on that) to try
and come up with an idea.


(5) On key encryption and hashing, this is definitely an issue that we need
a solution for. I currently have key encryption configurable in my
implementation. When encryption is enabled, an option would of course be to
hash the original key and store the key data together with the value in an
encrypted form. Any salt added to the key before hashing could be encrypted
along with the data. This would allow all key-based functionality like
compaction, joins etc. to keep working without having to know the cleartext
key.

I've also considered deterministic encryption which would keep the
encrypted key the same, but I am fairly certain that we will want to allow
regular key rotation (more on this in next paragraph) without re-encrypting
older data and that would then change the encrypted key and break all these
things.
Regarding re-encrypting existing keys when a crypto key is compromised, I
think we need to be very careful with this if we do it in-place on the
broker. If we add functionality along the lines of compaction, which reads
re-encrypts and rewrites segment files we have to make sure that producers
chose partitions on the cleartext value, otherwise all records starting
from the key change may go to a different partition of the topic..

(6) Key rollover would be a cool feature to have. I was up until now only
thinking about supporting regular key rollover functionality that would
change keys for all records going forward tbh - mostly for complexity
reasons - I think there was actually a sentence in the original KIP to this
regard. But if you and others feel this is needed then I am happy to
discuss this.
If we implement this on the broker we could use topic compaction for
inspiration, read all segment files and check records one by one, if the
key used for that record has been "retired/compromised/..." re-encrypt with
new key and write a new segment file. Lots of things to consider around
this regarding performance, how to trigger etc. but in principle this could
work I think.
One issue I can see with this is if we use envelope encryption for the keys
to address the rogue admin issue, so the broker doesn't have access to the
actual key encrypting the data, this would make that operation impossible.



I hope I got to all items that were raised, but may very well have
overlooked something, please let me know if I did - and of course your
thoughts on what I wrote!

I'll update the KIP today as well.

Best regards,
Sönke




On Thu, 7 May 2020 at 19:54, Ryanne Dolan <ryannedo...@gmail.com> wrote:

> Tom, good point, I've done exactly that -- hashing record keys -- but it's
> unclear to me what should happen when the hash key must be rotated. In my
> case the (external) solution involved rainbow tables, versioned keys, and
> custom materializers that were aware of older keys for each record.
>
> In particular I had a pipeline that would re-key records and re-ingest
> them, while opportunistically overwriting records materialized with the old
> key.
>
> For a native solution I think maybe we'd need to carry around any old
> versions of each record key, perhaps as metadata. Then brokers and
> materializers can compact records based on _any_ overlapping key, maybe?
> Not sure.
>
> Ryanne
>
> On Thu, May 7, 2020, 12:05 PM Tom Bentley <tbent...@redhat.com> wrote:
>
> > Hi Rayanne,
> >
> > You raise some good points there.
> >
> > Similarly, if the whole record is encrypted, it becomes impossible to do
> > > joins, group bys etc, which just need the record key and maybe don't
> have
> > > access to the encryption key. Maybe only record _values_ should be
> > > encrypted, and maybe Kafka Streams could defer decryption until the
> > actual
> > > value is inspected. That way joins etc are possible without the
> > encryption
> > > key, and RocksDB would not need to decrypt values before materializing
> to
> > > disk.
> > >
> >
> > It's getting a bit late here, so maybe I overlooked something, but
> wouldn't
> > the natural thing to do be to make the "encrypted" key a hash of the
> > original key, and let the value of the encrypted value be the cipher text
> > of the (original key, original value) pair. A scheme like this would
> > preserve equality of the key (strictly speaking there's a chance of
> > collision of course). I guess this could also be a solution for the
> > compacted topic issue Sönke mentioned.
> >
> > Cheers,
> >
> > Tom
> >
> >
> >
> > On Thu, May 7, 2020 at 5:17 PM Ryanne Dolan <ryannedo...@gmail.com>
> wrote:
> >
> > > Thanks Sönke, this is an area in which Kafka is really, really far
> > behind.
> > >
> > > I've built secure systems around Kafka as laid out in the KIP. One
> issue
> > > that is not addressed in the KIP is re-encryption of records after a
> key
> > > rotation. When a key is compromised, it's important that any data
> > encrypted
> > > using that key is immediately destroyed or re-encrypted with a new key.
> > > Ideally first-class support for end-to-end encryption in Kafka would
> make
> > > this possible natively, or else I'm not sure what the point would be.
> It
> > > seems to me that the brokers would need to be involved in this process,
> > so
> > > perhaps a client-first approach will be painting ourselves into a
> corner.
> > > Not sure.
> > >
> > > Another issue is whether materialized tables, e.g. in Kafka Streams,
> > would
> > > see unencrypted or encrypted records. If we implemented the KIP as
> > written,
> > > it would still result in a bunch of plain text data in RocksDB
> > everywhere.
> > > Again, I'm not sure what the point would be. Perhaps using custom
> serdes
> > > would actually be a more holistic approach, since Kafka Streams etc
> could
> > > leverage these as well.
> > >
> > > Similarly, if the whole record is encrypted, it becomes impossible to
> do
> > > joins, group bys etc, which just need the record key and maybe don't
> have
> > > access to the encryption key. Maybe only record _values_ should be
> > > encrypted, and maybe Kafka Streams could defer decryption until the
> > actual
> > > value is inspected. That way joins etc are possible without the
> > encryption
> > > key, and RocksDB would not need to decrypt values before materializing
> to
> > > disk.
> > >
> > > This is why I've implemented encryption on a per-field basis, not at
> the
> > > record level, when addressing kafka security in the past. And I've had
> to
> > > build external pipelines that purge, re-encrypt, and re-ingest records
> > when
> > > keys are compromised.
> > >
> > > This KIP might be a step in the right direction, not sure. But I'm
> > hesitant
> > > to support the idea of end-to-end encryption without a plan to address
> > the
> > > myriad other problems.
> > >
> > > That said, we need this badly and I hope something shakes out.
> > >
> > > Ryanne
> > >
> > > On Tue, Apr 28, 2020, 6:26 PM Sönke Liebau
> > > <soenke.lie...@opencore.com.invalid> wrote:
> > >
> > > > All,
> > > >
> > > > I've asked for comments on this KIP in the past, but since I didn't
> > > really
> > > > get any feedback I've decided to reduce the initial scope of the KIP
> a
> > > bit
> > > > and try again.
> > > >
> > > > I have reworked to KIP to provide a limited, but useful set of
> features
> > > for
> > > > this initial KIP and laid out a very rough roadmap of what I'd
> envision
> > > > this looking like in a final version.
> > > >
> > > > I am aware that the KIP is currently light on implementation details,
> > but
> > > > would like to get some feedback on the general approach before fully
> > > > speccing everything.
> > > >
> > > > The KIP can be found at
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka
> > > >
> > > >
> > > > I would very much appreciate any feedback!
> > > >
> > > > Best regards,
> > > > Sönke
> > > >
> > >
> >
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany

Reply via email to