[ 
https://issues.apache.org/jira/browse/SAMZA-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942439#comment-13942439
 ] 

Chris Riccomini commented on SAMZA-198:
---------------------------------------

Several initial thoughts:

1. What do you think about just giving the serdes IncomingMessageEnvelope with 
byte arrays for both the key and value? This is a super set of the information 
you need.
2. This is somewhat specific to serde'ing messages. The nice thing about 
toBytes and fromBytes right now is that it's a serde that can be used for 
everything (e.g. leveldb serialization, etc) including cases where the bytes 
don't have a SystemStreamPartition associated with them.

Need to think about this a bit more.

> Provide SystemStreamPartition info to SerDe fromBytes/toBytes methods
> ---------------------------------------------------------------------
>
>                 Key: SAMZA-198
>                 URL: https://issues.apache.org/jira/browse/SAMZA-198
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Jakob Homan
>
> Right now the Deserializer fromBytes method takes just a byte array, meaning 
> that it doesn't know anything about where those bytes came from.
> We have a use case with Avro messages coming from Kafka where we may be 
> getting several different versions of the same schema (each different version 
> coming from a different stream-partition).  This works okay.  However, in the 
> same stream task, we're actually consuming from more than one type of Avro 
> message and each of those types has that same situation.
> Once we're in the process method we can take the generic record and poke it 
> for its internal structure to see what type and version it is.  At this point 
> we can re-encode it if necessary to bring its schema version up to the latest 
> before sending it on.  However, this extra work is expensive and is 
> dominating the time spent in the process method.
> However, if at deserialization time we knew what SSP the message came from, 
> we could provide the Avro GenericDatumReader the reader schema, thus saving 
> the expensive re-encode step in the process method.
> I imagine other systems could benefit from this extra info as well.  The 
> information is available in the IncomingMessageEnvelope when we call the 
> deserializer, it's just not being passed in.
> (A parallel argument applies to the toBytes method in the Serializer 
> interface)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to