I would like to take a look at your code, a branch would be welcome. Thanks!
This is an interesting use case because it isn't using Avro
serialization for file or messaging persistence ... well, not
strictly! It's an interesting approach to have the generated record
_itself_ responsible for ensuring that it is nicely encrypted as it
passes over a network link or is spilled to disk temporarily, or saved
in a streaming checkpoint, etc!
I've only ever used Flink in the context of Beam, where you could
write a custom coder ("EncryptedAvroCoder") for objects in a
PCollection and override all serialization for that distributed
collection. Does something similar exist in pure Flink?
All my best, Ryan
On Thu, Jun 18, 2020 at 2:18 PM Enrico Agnoli
<[email protected]> wrote:
>
> Hi Ryan,
> Thanks for getting back to me.
>
> Yes, the change is for the JAVA library, as you mention in other languages it
> doesn't seem easy to make it as a library to be able to delegate like we don
> in the JVM. It is however feasible to deserialize the data in another
> language, given access to the same encryption libraries, as the structure of
> the serialized object is known to the developer.
>
> I modified the GenericDatumWriter/Reader as there I found the main entry
> methods:
> ```
> public D read(D reuse, Decoder in) throws IOException
> ```
> And
> ```
> public void write(D datum, Encoder out) throws IOException
> ```
>
> I do have also a generalized template that is used for all our "tenanted"
> schemas, that extends an abstract class and delegates to it the
> beforeDeserialization, afterSerialization so to centralize the code.
>
> About the customCode, I didn't try to get that route. I didn't find much
> documentation to tell you the truth.
> I did however try couple of other extension one of which was the logicTypes.
> As you can see in the signature
> ```
> public ByteBuffer fromBytes(ByteBuffer value, Schema schema, LogicalType
> type)
> ```
> there we don't have access to the original object where we would have the
> tenant information needed to retrieve the right token to use to encrypt the
> data.
>
> Would it make sense that I open a branch to show some code?
>
> Best,
> -Enrico
>
> On 6/17/20, 4:39 PM, "Ryan Skraba" <[email protected]> wrote:
>
> Hi! I was interested enough to watch the entire video from Flink Forward.
>
> I do think this is a good proposal, and adding hooks to "customize"
> the serialized bytes is a pretty neat idea. The developer can benefit
> from learning or using Avro-generated classes and the SDK, and still
> using standard serialization underneath the customized logic.
>
> At first glance, this would stay in the Java SDK, right? I mean, once
> you've customized your Avro specific record with it's own
> serialization layer, there's little hope (without extensive work) for
> a different language to expect to be able to read it. In other words,
> you'd never be able to write it to an Avro file and never expect it to
> be readable via another programming language or using a generic
> model... which is kind of the point!
>
> Is there any use to having these changes in the
> GenericDatumWriter/Reader as opposed to the
> SpecificDatumWriter/Reader? Would there ever be an instance where a
> generic model of data would delegate serialization?
>
> Do you think that the necessary changes you've made to the specific
> data templates could be generalized? I believe I've already come
> across a situation where we've customized the "extends
> MySpecificRecordBase" part of the templates -- it could be a
> configuration option. I'm not sure whether passing along the record
> context (tenant id) to nested elements is generalizable, but I haven't
> thought very hard about it yet.
>
> Have you looked into the `customEncode` parts of generated specific
> records? This or something similar might be a more flexible technique
> than the SerializeFinalizationDelegate interface methods.
>
> Thanks for sharing! Ryan
>
> On Tue, Jun 16, 2020 at 3:02 PM Enrico Agnoli
> <[email protected]> wrote:
> >
> > Hi,
> >
> > I would like to make a proposal change to AVRO to allow services to
> integrate some logic after serialization and before deserialization.
> > We use AVRO to support the data serialization in our streaming
> infrastructure and we decided to extend it to provide us the possibility to
> encrypt the data with info available directly on the data itself: the owner
> of it.
> > The change-set is pretty small and I would like to hear from you if it
> makes sense to contribute it back to the project.
> >
> > == The problem is:
> > Multi-tenants applications have the need to encrypt data (with the keys
> of the owner/tenant that generated that piece of data) every time it is
> serialized to avoid commingling of different tenant data. To do so,
> transparently to the application, the ideal place to implement the encryption
> it is in the serialization library (AVRO).
> >
> > == Proposal:
> > We modified the AVRO code to have afterSerialization and
> beforeDeserialization hooks that can use object defined values (the
> tenant/owner of that data) to implement encryption.
> > In the code we propose to submit we implemented a new interface:
> `SerializeFinalizationDelegate.java`
> > ```
> > public interface SerializeFinalizationDelegate {
> > void afterSerialization(ByteArrayOutputStream serializedData, Encoder
> finalEncoder);
> > Decoder beforeDeserialization(Decoder dataToDecode);
> > }
> > ```
> > That needs to be implemented by any AVRO serializable class that wants
> to define a post-serialization or pre-deserialization logic.
> > `GenericDatumWriter` and `GenericDatumReader` are modified to delegate
> to the object implementation of the methods above.
> >
> > More info can be found at
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.slideshare.net_FlinkForward_multi-2Dtenanted-2Dstreams-2Dworkday-2Denrico-2Dagnoli-2Dleire-2Dfernandez-2Dde-2Dretana-2Droitegui-2Dworkday-2D185815223&d=DwIFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=5oal4CtBGP1ioAe2G2rMT-XLCpWwh5R4aEw1TqtlCnc&m=Xu7g3Tz4gpvKrNVQaH8E_gOocZRRxOjiYDGo8Y44Peg&s=dea8kpG8JMBbu6GIqT176VBrvrIrnXdoMByO2cD9SS4&e=
> from slide 21
> >
> >
> > What do you think about this proposal? I wanted to first start a
> discussion, but if it helps I can create a patch or a branch to show the
> change,
> >
> > Hope to hear from you,
> > -Enrico
>
>