Hi Ryan,
Thanks for getting back to me.

Yes, the change is for the JAVA library, as you mention in other languages it 
doesn't seem easy to make it as a library to be able to delegate like we don in 
the JVM. It is however feasible to deserialize the data in another language, 
given access to the same encryption libraries, as the structure of the 
serialized object is known to the developer.

I modified the GenericDatumWriter/Reader as there I found the main entry 
methods: 
```
  public D read(D reuse, Decoder in) throws IOException 
```
And
```
  public void write(D datum, Encoder out) throws IOException 
```

I do have also a generalized template that is used for all our "tenanted" 
schemas, that extends an abstract class and delegates to it the 
beforeDeserialization, afterSerialization so to centralize the code.

About the customCode, I didn't try to get that route. I didn't find much 
documentation to tell you the truth.
I did however try couple of other extension one of which was the logicTypes. As 
you can see in the signature
```
    public ByteBuffer fromBytes(ByteBuffer value, Schema schema, LogicalType 
type)  
```
there we don't have access to the original object where we would have the 
tenant information needed to retrieve the right token to use to encrypt the 
data.

Would it make sense that I open a branch to show some code?

Best,
-Enrico

On 6/17/20, 4:39 PM, "Ryan Skraba" <[email protected]> wrote:

    Hi!  I was interested enough to watch the entire video from Flink Forward.
    
    I do think this is a good proposal, and adding hooks to "customize"
    the serialized bytes is a pretty neat idea.  The developer can benefit
    from learning or using Avro-generated classes and the SDK, and still
    using standard serialization underneath the customized logic.
    
    At first glance, this would stay in the Java SDK, right?  I mean, once
    you've customized your Avro specific record with it's own
    serialization layer, there's little hope (without extensive work) for
    a different language to expect to be able to read it.  In other words,
    you'd never be able to write it to an Avro file and never expect it to
    be readable via another programming language or using a generic
    model... which is kind of the point!
    
    Is there any use to having these changes in the
    GenericDatumWriter/Reader as opposed to the
    SpecificDatumWriter/Reader?  Would there ever be an instance where a
    generic model of data would delegate serialization?
    
    Do you think that the necessary changes you've made to the specific
    data templates could be generalized?  I believe I've already come
    across a situation where we've customized the "extends
    MySpecificRecordBase" part of the templates -- it could be a
    configuration option.  I'm not sure whether passing along the record
    context (tenant id) to nested elements is generalizable, but I haven't
    thought very hard about it yet.
    
    Have you looked into the `customEncode` parts of generated specific
    records?  This or something similar might be a more flexible technique
    than the SerializeFinalizationDelegate interface methods.
    
    Thanks for sharing!  Ryan
    
    On Tue, Jun 16, 2020 at 3:02 PM Enrico Agnoli
    <[email protected]> wrote:
    >
    > Hi,
    >
    > I would like to make a proposal change to AVRO to allow services to 
integrate some logic after serialization and before deserialization.
    > We use AVRO to support the data serialization in our streaming 
infrastructure and we decided to extend it to provide us the possibility to 
encrypt the data with info available directly on the data itself: the owner of 
it.
    > The change-set is pretty small and I would like to hear from you if it 
makes sense to contribute it back to the project.
    >
    > == The problem is:
    > Multi-tenants applications have the need to encrypt data (with the keys 
of the owner/tenant that generated that piece of data) every time it is 
serialized to avoid commingling of different tenant data. To do so, 
transparently to the application, the ideal place to implement the encryption 
it is in the serialization library (AVRO).
    >
    > == Proposal:
    > We modified the AVRO code to have afterSerialization and 
beforeDeserialization hooks that can use object defined values (the 
tenant/owner of that data) to implement encryption.
    > In the code we propose to submit we implemented a new interface: 
`SerializeFinalizationDelegate.java`
    > ```
    > public interface SerializeFinalizationDelegate {
    >   void afterSerialization(ByteArrayOutputStream serializedData, Encoder 
finalEncoder);
    >   Decoder beforeDeserialization(Decoder dataToDecode);
    > }
    > ```
    > That needs to be implemented by any AVRO serializable class that wants to 
define a post-serialization or pre-deserialization logic.
    > `GenericDatumWriter` and `GenericDatumReader` are modified to delegate to 
the object implementation of the methods above.
    >
    > More info can be found at 
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.slideshare.net_FlinkForward_multi-2Dtenanted-2Dstreams-2Dworkday-2Denrico-2Dagnoli-2Dleire-2Dfernandez-2Dde-2Dretana-2Droitegui-2Dworkday-2D185815223&d=DwIFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=5oal4CtBGP1ioAe2G2rMT-XLCpWwh5R4aEw1TqtlCnc&m=Xu7g3Tz4gpvKrNVQaH8E_gOocZRRxOjiYDGo8Y44Peg&s=dea8kpG8JMBbu6GIqT176VBrvrIrnXdoMByO2cD9SS4&e=
  from slide 21
    >
    >
    > What do you think about this proposal? I wanted to first start a 
discussion, but if it helps I can create a patch or a branch to show the change,
    >
    > Hope to hear from you,
    > -Enrico
    

Reply via email to