Hi Tom,

It's an interesting idea.  Obviously protocol buffers does this for all numeric 
fields.

I have to admit I have some mixed feelings, since this is another thing that 
makes encoding more complex.  And it's not a clear win in all cases, although 
it is in some.

I assume that the performance numbers here are for the old Struct-based 
encoding / decoding system.  Do we know what the numbers are when using the 
generated read and write functions?

I don't think it makes sense for type to be version-dependent.  Type ultimately 
translates into what Java type we should use to represent the field when it's 
in POJO form.  We can't have two types there.

Making encoding version-dependent is reasonable.  I do sort of wonder if we 
should just have something like "packedVersions" : "9+" rather than the 
"encoding" thing, though.  The latter is more conceptually elegant but it seems 
like it would be a pain to use.  For example, for a new field you would have to 
type "encoding": { "0+" : "packed" } which is kind of ugly.

Another thing here is that if we are going to go all this way for optimization, 
we should certainly give people the choice of whether to use zigzag encoding or 
not.  For fields that can never be negative, zigzag encoding is a waste.  So 
you would then have the option of unpacked, signed packed, and unsigned packed.

Finally, the Kafka protocol has a lot of fields which can never be negative, 
except for some special cases where they are -1.  But no other negative numbers 
are allowed.  So we should consider making the "unsigned packed" option 
actually encode num + 1 just to support this usage.  That's what we did for 
string and bytes prefix length encoding and it worked well.

Before we do all this, though, one simpler improvement would be making all the 
"error" fields into tagged fields.  Most of them remain at 0 most of the time, 
so this could very well provide a big savings without any big encoding changes.

best,
Colin


On Mon, Jun 15, 2020, at 05:34, Tom Bentley wrote:
> Hi,
> 
> I'd like to start discussion on KIP-625: Richer encodings for
> integral-typed protocol fields.
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-625%3A+Richer+encodings+for+integral-typed+protocol+fields
> 
> It's about allowing regular/required fields of protocol messages to use
> variable length encoding. If you have a moment please take a look.
> 
> Kind regards,
> 
> Tom
>

Reply via email to