Dear All:
I need to generate some data by Samza to Kafka and then write to
Parquet formate file.
I was asked why I choose Avro type as my Samza output to Kafka
instead of Protocol Buffer. Since currently our data on Kafka are all
Protocol buffer type message.
I explained that Avro encoded message has advantages such as, the
encoded size smaller, no extra code compile, implementation easier. fast
to serialize/deserialize and supporting a lot language.
However some people believe when encoded the Avro message take as
much space as Protocol buffer, but with schema, the size could be much
bigger.
I am wondering if there are any other advantages make you choose
Avro as your message type How you consider the data size for Avro vs
Protocol buffer?
Sincerely,
Selina
Reference:
1. https://issues.apache.org/jira/browse/SAMZA-317
2.
http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
3. https://avro.apache.org/docs/1.7.7/gettingstartedjava.html
4.
https://www.igvita.com/2011/08/01/protocol-buffers-avro-thrift-messagepack/
5. http://tech.puredanger.com/2011/05/27/serialization-comparison/