Hi,

I would like to make a proposal change to AVRO to allow services to integrate 
some logic after serialization and before deserialization.
We use AVRO to support the data serialization in our streaming infrastructure 
and we decided to extend it to provide us the possibility to encrypt the data 
with info available directly on the data itself: the owner of it.
The change-set is pretty small and I would like to hear from you if it makes 
sense to contribute it back to the project.

== The problem is:
Multi-tenants applications have the need to encrypt data (with the keys of the 
owner/tenant that generated that piece of data) every time it is serialized to 
avoid commingling of different tenant data. To do so, transparently to the 
application, the ideal place to implement the encryption it is in the 
serialization library (AVRO).

== Proposal:
We modified the AVRO code to have afterSerialization and beforeDeserialization 
hooks that can use object defined values (the tenant/owner of that data) to 
implement encryption.
In the code we propose to submit we implemented a new interface: 
`SerializeFinalizationDelegate.java`
```
public interface SerializeFinalizationDelegate {
  void afterSerialization(ByteArrayOutputStream serializedData, Encoder 
finalEncoder);
  Decoder beforeDeserialization(Decoder dataToDecode);
}
```
That needs to be implemented by any AVRO serializable class that wants to 
define a post-serialization or pre-deserialization logic.
`GenericDatumWriter` and `GenericDatumReader` are modified to delegate to the 
object implementation of the methods above.

More info can be found at 
https://www.slideshare.net/FlinkForward/multi-tenanted-streams-workday-enrico-agnoli-leire-fernandez-de-retana-roitegui-workday-185815223
 from slide 21


What do you think about this proposal? I wanted to first start a discussion, 
but if it helps I can create a patch or a branch to show the change,

Hope to hear from you,
-Enrico

Reply via email to