[ 
https://issues.apache.org/jira/browse/KAFKA-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-544:
----------------------------

    Attachment: KAFKA-544-v4.patch

Final patch fixes a bug that effected the system tests. This patch is ready for 
review.

To summarize, here are the changes listed in one place
1. Change encoder/decoder to 
     def toBytes(t: T)
     def fromBytes(bytes: Array[Byte]): T
I also took the opportunity to pass properties into the encoders and decoders 
so encoders and decoders now require a constructor that takes a 
VerifiableProperties. This allows for things like schema registry url, 
character encoding, etc.
2. Rename ProducerData to KeyedMessage and make it only contain a single 
key-value pair.
3. Add a new property for the producer, key.serializer.class to complement the 
already existing serializer.class. The key serializer defaults to the same 
value as the value serializer since I think that will be common (e.g. both 
Avro).
4. ConsumerConnector now requires two decoders, one for the key and one for the 
value. The type of the resulting stream is now [K,V] rather than just [T].
5. Exposed partition and offset in MessageAndMetadata class
6. Changed unit tests to mostly use strings instead of Message or byte[]

This code is ready for review (por favor).

We also need to make a call whether we want this in 0.8. It is not a very 
tricky change, but it does touch a lot of files. Since it is a compatibility 
change it would be nice to do it in 0.8, but it is awfully late in the game for 
this...
                
> Retain key in producer and expose it in the consumer
> ----------------------------------------------------
>
>                 Key: KAFKA-544
>                 URL: https://issues.apache.org/jira/browse/KAFKA-544
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>            Priority: Blocker
>              Labels: bugs
>         Attachments: KAFKA-544-v1.patch, KAFKA-544-v2.patch, 
> KAFKA-544-v3.patch, KAFKA-544-v4.patch
>
>
> KAFKA-506 added support for retaining a key in the messages, however this 
> field is not yet set by the producer.
> The proposal for doing this is to change the producer api to change 
> ProducerData to allow only a single key/value pair so it has a one-to-one 
> mapping to Message. That is change from
>   ProducerData(topic: String, key: K, data: Seq[V])
> to
>   ProducerData(topic: String, key: K, data: V)
> The key itself needs to be encoded. There are several ways this could be 
> handled. A few of the options:
> 1. Change the Encoder and Decoder to be MessageEncoder and MessageDecoder and 
> have them take both a key and value.
> 2. Another option is to change the type of the encoder/decoder to not refer 
> to Message so it could be used for both the key and value.
> I favor the second option but am open to feedback.
> One concern with our current approach to serialization as well as both of 
> these proposals is that they are inefficient. We go from 
> Object=>byte[]=>Message=>MessageSet with a copy at each step. In the case of 
> compression there are a bunch of intermediate steps. We could theoretically 
> clean this up by instead having an interface for the encoder that was 
> something like
>    Encoder.writeTo(buffer: ByteBuffer, object: AnyRef)
> and
>    Decoder.readFrom(buffer:ByteBuffer): AnyRef
> However there are two problems with this. The first is that we don't actually 
> know the size of the data until  it is serialized so we can't really allocate 
> the bytebuffer properly and might need to resize it. The second is that in 
> the case of compression there is a whole other path to consider. Originally I 
> thought maybe it would be good to try to fix this, but now I think it should 
> be out-of-scope and we should revisit the efficiency issue in a future 
> release in conjunction with our internal handling of compression.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to