[ 
https://issues.apache.org/jira/browse/SAMZA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227253#comment-14227253
 ] 

Milinda Lakmal Pathirage commented on SAMZA-484:
------------------------------------------------

In addition to figuring out the wire format, we should figure out a way to 
describe a tuple in our relational algebra and an API to represent the tuple. 

Naive tuple description can be a mapping of field/column name to its type + 
extra metadata and several operations such as get type of field, get all fields 
over this map. Extra metadata includes information such as 

- null values are allowed or not
- default value

Then based on this description we can implement a tuple API which allows us to 
retrieve field value by field name, but with correct type. In Freshet, I had 
API methods such as getStringField, getLongField for this purpose. In addition 
to this, there should be a way to handle things such as default values, in 
cases where some values are missing in the original tuple. 

One possible way to handle default values is to modify the original tuple at 
the entry point. But I'm not sure whether this is the ideal method to handle 
default values.






> Define the serialization/deserialization format for stream tuple
> ----------------------------------------------------------------
>
>                 Key: SAMZA-484
>                 URL: https://issues.apache.org/jira/browse/SAMZA-484
>             Project: Samza
>          Issue Type: Sub-task
>            Reporter: Yi Pan (Data Infrastructure)
>            Priority: Minor
>              Labels: project
>
> It came out in the discussion for streaming SQL that we will need to define 
> the serialization/deserialization format for stream tuple.
> The ideal serialization/deserialization format should allow both forward and 
> backward compatibility on additional/missing fields in the data.
> Several choices to be considered:
> 1) Avro
> 2) Protobuf
> 3) Flatbuffer
> It might also be interesting to consider a pluggable serialization interface 
> that allows different serialization methods for different Samza jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to