The null fields bitset encoder is defines in the pipeline runner proto
here:
https://github.com/apache/beam/blob/4b11efdf96ea4a471e078ec49906c40ef033aafb/model/pipeline/src/main/proto/beam_runner_api.proto#L976

Per my reading of the spec, the bit set must include the ceiling of
num_fields/8 bytes, as it doesn't say "trailing bytes for non-nil in fields
may be dropped". However it might be interpreted as that by the that an
empty byte array indicating no nils.  This is what go implements in the
coder.WriteRowHeader and coder.ReadRowHeader functions.

But that strikes me as a special case for fully populated rows, not a
natural extension of a poorly phrased general rule.

On Tue, Oct 12, 2021, 1:31 PM Reuven Lax <[email protected]> wrote:

> Do you think that BitSetCoder is incorrect?
>
> On Tue, Oct 12, 2021 at 1:27 PM Steve Niemitz <[email protected]> wrote:
>
>> Yeah I believe they're all bugs/missing features in the python
>> implementation.  The nullable BitSet one is arguably a bug in the java
>> implementation, but since there's no low-level spec on how Rows are
>> actually encoded it's hard to say who's right.  I think Go might have the
>> same bug there, in which case that's two languages doing it "wrong" and one
>> doing it "right". :P
>>
>> On Tue, Oct 12, 2021 at 4:20 PM Reuven Lax <[email protected]> wrote:
>>
>>> These are bugs in Python, correct?
>>>
>>> On Tue, Oct 12, 2021 at 1:18 PM Steve Niemitz <[email protected]>
>>> wrote:
>>>
>>>> It seems like there's a good amount of incompatibility between java and
>>>> python wrt beam Rows.  For example the following are unsupported in python
>>>> (that I've noticed so far)
>>>> - BYTE
>>>> - INT16
>>>> - OneOf
>>>>
>>>> Additionally, it seems like nullable fields don't really work
>>>> correctly, the java BitSetCoder won't encoding trailing empty bytes in the
>>>> BitSet, but the python side is expecting every num_fields / 8 bytes to be
>>>> present. [1]
>>>>
>>>> Certainly these are bugs, but in general it seems to point to a lack of
>>>> integration testing for xlang interop in general.  I plan on submitting PRs
>>>> to fix the bugs above (or at least some of them), are there tests I can
>>>> change to better exercise these paths?
>>>>
>>>> [1]
>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/row_coder.py#L198
>>>>
>>>

Reply via email to