Jay Kreps created KAFKA-544:
-------------------------------

             Summary: Retain key in producer
                 Key: KAFKA-544
                 URL: https://issues.apache.org/jira/browse/KAFKA-544
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 0.8
            Reporter: Jay Kreps
            Assignee: Jay Kreps


KAFKA-506 added support for retaining a key in the messages, however this field 
is not yet set by the producer.

The proposal for doing this is to change the producer api to change 
ProducerData to allow only a single key/value pair so it has a one-to-one 
mapping to Message. That is change from
  ProducerData(topic: String, key: K, data: Seq[V])
to
  ProducerData(topic: String, key: K, data: V)

The key itself needs to be encoded. There are several ways this could be 
handled. A few of the options:
1. Change the Encoder and Decoder to be MessageEncoder and MessageDecoder and 
have them take both a key and value.
2. Another option is to change the type of the encoder/decoder to not refer to 
Message so it could be used for both the key and value.

I favor the second option but am open to feedback.

One concern with our current approach to serialization as well as both of these 
proposals is that they are inefficient. We go from 
Object=>byte[]=>Message=>MessageSet with a copy at each step. In the case of 
compression there are a bunch of intermediate steps. We could theoretically 
clean this up by instead having an interface for the encoder that was something 
like
   Encoder.writeTo(buffer: ByteBuffer, object: AnyRef)
and
   Decoder.readFrom(buffer:ByteBuffer): AnyRef
However there are two problems with this. The first is that we don't actually 
know the size of the data until  it is serialized so we can't really allocate 
the bytebuffer properly and might need to resize it. The second is that in the 
case of compression there is a whole other path to consider. Originally I 
thought maybe it would be good to try to fix this, but now I think it should be 
out-of-scope and we should revisit the efficiency issue in a future release in 
conjunction with our internal handling of compression.





--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to