[ https://issues.apache.org/jira/browse/KAFKA-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297473#comment-15297473 ]
ASF GitHub Bot commented on KAFKA-3744: --------------------------------------- GitHub user davek2 opened a pull request: https://github.com/apache/kafka/pull/1419 Allocate 2 attribute bits to signal payload format This documentation update proposes a mechanism to signal the serialization used for the message payload, resolving issue https://issues.apache.org/jira/browse/KAFKA-3744. No change is made to the message structure; two previously-reserved bits in the attribute byte now have defined values, and for one of four cases the key field is defined to be a JSON object. No change is required to messaging software. No change is required to existing producer and consumer clients that use pre-agreed payload serialization. Misc notes: 1) Only one attribute bit would be needed if serialization were always signalled using the key field. But it seems preferable to define two common serializations that do not have any dependency on the key field. Selection of the common formats is arbitrary; text and avro seem reasonable but any two could be used instead. 2) The compression attribute uses three bits but only two are defined. If the intent is to use all three bits for compression the undefined values should be listed as reserved; if not, the timestamp attribute can slide down to bit 2 and serialization to bits 3~4, leaving bits 5~7 reserved. 3) It's unclear why message field 6 should be called "key" - a variable-length field is more likely to be described as "attributes" or "metadata", and 1-byte field 3 would be called "flags" instead of "attributes". 4) Field 8 is called "payload" under message format and "value" under on-disk format. Changed to payload in both places. You can merge this pull request into a Git repository by running: $ git pull https://github.com/davek2/kafka trunk Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1419.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1419 ---- commit 1d88b8d48cdfe67989bebf239f7588ca24e961b6 Author: Joe <dav...@hotmail.com> Date: 2016-05-24T00:32:04Z Allocate 2 attribute bits for payload format ---- > Message format needs to identify serializer > ------------------------------------------- > > Key: KAFKA-3744 > URL: https://issues.apache.org/jira/browse/KAFKA-3744 > Project: Kafka > Issue Type: Improvement > Reporter: David Kay > Priority: Minor > > https://issues.apache.org/jira/browse/KAFKA-3698 was recently resolved with > https://github.com/apache/kafka/commit/27a19b964af35390d78e1b3b50bc03d23327f4d0. > But Kafka documentation on message formats needs to be more explicit for new > users. Section 1.3 Step 4 says: "Send some messages" and takes lines of text > from the command line. Beginner's guide > (http://www.slideshare.net/miguno/apache-kafka-08-basic-training-verisign > Slide 104 says: > {noformat} > Kafka does not care about data format of msg payload > Up to developer to handle serialization/deserialization > Common choices: Avro, JSON > {noformat} > If one producer sends lines of console text, another producer sends Avro, a > third producer sends JSON, and a fourth sends CBOR, how does the consumer > identify which deserializer to use for the payload? The commit includes an > opaque K byte Key that could potentially include a codec identifier, but > provides no guidance on how to use it: > {quote} > "Leaving the key and value opaque is the right decision: there is a great > deal of progress being made on serialization libraries right now, and any > particular choice is unlikely to be right for all uses. Needless to say a > particular application using Kafka would likely mandate a particular > serialization type as part of its usage." > {quote} > Mandating any particular serialization is as unrealistic as mandating a > single mime-type for all web content. There must be a way to signal the > serialization used to produce this message's V byte payload, and documenting > the existence of even a rudimentary codec registry with a few values (text, > Avro, JSON, CBOR) would establish the pattern to be used for future > serialization libraries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)