The standard VARINT coder is used for all sorts of integer values (e.g. the
output of the CountElements transform), but the vast majority of them are
likely significantly less than a full 64 bits. In Python, declaring an
element type to be int will use this. On the other hand, using a VarInt
format for int8 seems quite wasteful. Where the cutoff is is probably
arbitrary, but the java 32-bit int type is often used as the generic (and
often small-ish) integer type in Java, whereas int16 is an explicit choice
where one knows that 16 bits is good enough, but 8 isn't.

It looks like Go use the VarInt encoding everywhere:
https://github.com/apache/beam/blob/release-2.14.0/sdks/go/pkg/beam/coder.go#L135
. Python, as mentioned, uses VarInt encoding everywhere as well.

(There's also the question of whether we want to introduce StandardCoders
for all of these, or if we'd rather move to using Schemas over Coders and
just define them as part of the RowCoder.)




On Tue, Jul 30, 2019 at 8:30 PM Brian Hulette <[email protected]> wrote:

> Forgot to include a link to the code. The mapping from primitive type to
> coders can be found here:
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoder.java#L44
>
> On Tue, Jul 30, 2019 at 11:24 AM Brian Hulette <[email protected]>
> wrote:
>
>> Currently the coders used for integer types in RowCoder (and thus
>> SchemaCoder) are inconsistent. For int32 and int64, we use VarIntCoder and
>> VarLongCoder which encode those types with variable width, but for byte and
>> int16 we use ByteCoder and BigEndianShortCoder, which are fixed width.
>>
>> Is it a conscious choice to use variable width coders just for the larger
>> width integers (where they could have the most benefit), or should we
>> consider normalizing these coders to always be fixed width?
>>
>

Reply via email to