David Kay created KAFKA-3744:
--------------------------------
Summary: Message format needs to identify serializer
Key: KAFKA-3744
URL: https://issues.apache.org/jira/browse/KAFKA-3744
Project: Kafka
Issue Type: Improvement
Reporter: David Kay
Priority: Minor
https://issues.apache.org/jira/browse/KAFKA-3698 was recently resolved with
https://github.com/apache/kafka/commit/27a19b964af35390d78e1b3b50bc03d23327f4d0.
But Kafka documentation on message formats needs to be more explicit for new
users. Section 1.3 Step 4 says: "Send some messages" and takes lines of text
from the command line. Beginner's guide
(http://www.slideshare.net/miguno/apache-kafka-08-basic-training-verisign Slide
104 says:
{noformat}
Kafka does not care about data format of msg payload
Up to developer to handle serialization/deserialization
Common choices: Avro, JSON
{noformat}
If one producer sends lines of console text, another producer sends Avro, a
third producer sends JSON, and a fourth sends CBOR, how does the consumer
identify which deserializer to use for the payload? The commit includes an
opaque K byte Key that could potentially include a codec identifier, but
provides no guidance on how to use it:
{quote}
"Leaving the key and value opaque is the right decision: there is a great deal
of progress being made on serialization libraries right now, and any particular
choice is unlikely to be right for all uses. Needless to say a particular
application using Kafka would likely mandate a particular serialization type as
part of its usage."
{quote}
Mandating any particular serialization is as unrealistic as mandating a single
mime-type for all web content. There must be a way to signal the serialization
used to produce this message's V byte payload, and documenting the existence of
even a rudimentary codec registry with a few values (text, Avro, JSON, CBOR)
would establish the pattern to be used for future serialization libraries.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)