Hi!  I was interested enough to watch the entire video from Flink Forward.

I do think this is a good proposal, and adding hooks to "customize"
the serialized bytes is a pretty neat idea.  The developer can benefit
from learning or using Avro-generated classes and the SDK, and still
using standard serialization underneath the customized logic.

At first glance, this would stay in the Java SDK, right?  I mean, once
you've customized your Avro specific record with it's own
serialization layer, there's little hope (without extensive work) for
a different language to expect to be able to read it.  In other words,
you'd never be able to write it to an Avro file and never expect it to
be readable via another programming language or using a generic
model... which is kind of the point!

Is there any use to having these changes in the
GenericDatumWriter/Reader as opposed to the
SpecificDatumWriter/Reader?  Would there ever be an instance where a
generic model of data would delegate serialization?

Do you think that the necessary changes you've made to the specific
data templates could be generalized?  I believe I've already come
across a situation where we've customized the "extends
MySpecificRecordBase" part of the templates -- it could be a
configuration option.  I'm not sure whether passing along the record
context (tenant id) to nested elements is generalizable, but I haven't
thought very hard about it yet.

Have you looked into the `customEncode` parts of generated specific
records?  This or something similar might be a more flexible technique
than the SerializeFinalizationDelegate interface methods.

Thanks for sharing!  Ryan

On Tue, Jun 16, 2020 at 3:02 PM Enrico Agnoli
<[email protected]> wrote:
>
> Hi,
>
> I would like to make a proposal change to AVRO to allow services to integrate 
> some logic after serialization and before deserialization.
> We use AVRO to support the data serialization in our streaming infrastructure 
> and we decided to extend it to provide us the possibility to encrypt the data 
> with info available directly on the data itself: the owner of it.
> The change-set is pretty small and I would like to hear from you if it makes 
> sense to contribute it back to the project.
>
> == The problem is:
> Multi-tenants applications have the need to encrypt data (with the keys of 
> the owner/tenant that generated that piece of data) every time it is 
> serialized to avoid commingling of different tenant data. To do so, 
> transparently to the application, the ideal place to implement the encryption 
> it is in the serialization library (AVRO).
>
> == Proposal:
> We modified the AVRO code to have afterSerialization and 
> beforeDeserialization hooks that can use object defined values (the 
> tenant/owner of that data) to implement encryption.
> In the code we propose to submit we implemented a new interface: 
> `SerializeFinalizationDelegate.java`
> ```
> public interface SerializeFinalizationDelegate {
>   void afterSerialization(ByteArrayOutputStream serializedData, Encoder 
> finalEncoder);
>   Decoder beforeDeserialization(Decoder dataToDecode);
> }
> ```
> That needs to be implemented by any AVRO serializable class that wants to 
> define a post-serialization or pre-deserialization logic.
> `GenericDatumWriter` and `GenericDatumReader` are modified to delegate to the 
> object implementation of the methods above.
>
> More info can be found at 
> https://www.slideshare.net/FlinkForward/multi-tenanted-streams-workday-enrico-agnoli-leire-fernandez-de-retana-roitegui-workday-185815223
>  from slide 21
>
>
> What do you think about this proposal? I wanted to first start a discussion, 
> but if it helps I can create a patch or a branch to show the change,
>
> Hope to hear from you,
> -Enrico

Reply via email to