[
https://issues.apache.org/jira/browse/SAMZA-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942439#comment-13942439
]
Chris Riccomini commented on SAMZA-198:
---------------------------------------
Several initial thoughts:
1. What do you think about just giving the serdes IncomingMessageEnvelope with
byte arrays for both the key and value? This is a super set of the information
you need.
2. This is somewhat specific to serde'ing messages. The nice thing about
toBytes and fromBytes right now is that it's a serde that can be used for
everything (e.g. leveldb serialization, etc) including cases where the bytes
don't have a SystemStreamPartition associated with them.
Need to think about this a bit more.
> Provide SystemStreamPartition info to SerDe fromBytes/toBytes methods
> ---------------------------------------------------------------------
>
> Key: SAMZA-198
> URL: https://issues.apache.org/jira/browse/SAMZA-198
> Project: Samza
> Issue Type: Bug
> Reporter: Jakob Homan
>
> Right now the Deserializer fromBytes method takes just a byte array, meaning
> that it doesn't know anything about where those bytes came from.
> We have a use case with Avro messages coming from Kafka where we may be
> getting several different versions of the same schema (each different version
> coming from a different stream-partition). This works okay. However, in the
> same stream task, we're actually consuming from more than one type of Avro
> message and each of those types has that same situation.
> Once we're in the process method we can take the generic record and poke it
> for its internal structure to see what type and version it is. At this point
> we can re-encode it if necessary to bring its schema version up to the latest
> before sending it on. However, this extra work is expensive and is
> dominating the time spent in the process method.
> However, if at deserialization time we knew what SSP the message came from,
> we could provide the Avro GenericDatumReader the reader schema, thus saving
> the expensive re-encode step in the process method.
> I imagine other systems could benefit from this extra info as well. The
> information is available in the IncomingMessageEnvelope when we call the
> deserializer, it's just not being passed in.
> (A parallel argument applies to the toBytes method in the Serializer
> interface)
--
This message was sent by Atlassian JIRA
(v6.2#6252)