I agree! The spec should change to accommodate both bitset coder and the
current implementations.  I suspect the ship has sailed on making older
versions of this compatible but it would not be difficult to make the
implementations compatible. But we can make sure the next version is all
synced up.

On Tue, Oct 12, 2021, 1:52 PM Reuven Lax <[email protected]> wrote:

> Of course. But if we write a spec that doesn't agree with the existing
> Java coders, that is also futile. We can't easily change Java coders due to
> update compatibility concerns, and we definitely need portability to work
> with Java!
>
> On Tue, Oct 12, 2021 at 1:47 PM Robert Burke <[email protected]> wrote:
>
>> If it's not in the spec it's not Beam,  because any alternative is Anti
>> Portability ;)
>>
>> On Tue, Oct 12, 2021, 1:45 PM Reuven Lax <[email protected]> wrote:
>>
>>> Row just uses the existing Java BitSetCoder, which predates the writing
>>> of that spec :)
>>>
>>> On Tue, Oct 12, 2021 at 1:42 PM Robert Burke <[email protected]> wrote:
>>>
>>>> The null fields bitset encoder is defines in the pipeline runner proto
>>>> here:
>>>> https://github.com/apache/beam/blob/4b11efdf96ea4a471e078ec49906c40ef033aafb/model/pipeline/src/main/proto/beam_runner_api.proto#L976
>>>>
>>>> Per my reading of the spec, the bit set must include the ceiling of
>>>> num_fields/8 bytes, as it doesn't say "trailing bytes for non-nil in fields
>>>> may be dropped". However it might be interpreted as that by the that an
>>>> empty byte array indicating no nils.  This is what go implements in the
>>>> coder.WriteRowHeader and coder.ReadRowHeader functions.
>>>>
>>>> But that strikes me as a special case for fully populated rows, not a
>>>> natural extension of a poorly phrased general rule.
>>>>
>>>> On Tue, Oct 12, 2021, 1:31 PM Reuven Lax <[email protected]> wrote:
>>>>
>>>>> Do you think that BitSetCoder is incorrect?
>>>>>
>>>>> On Tue, Oct 12, 2021 at 1:27 PM Steve Niemitz <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Yeah I believe they're all bugs/missing features in the python
>>>>>> implementation.  The nullable BitSet one is arguably a bug in the java
>>>>>> implementation, but since there's no low-level spec on how Rows are
>>>>>> actually encoded it's hard to say who's right.  I think Go might have the
>>>>>> same bug there, in which case that's two languages doing it "wrong" and 
>>>>>> one
>>>>>> doing it "right". :P
>>>>>>
>>>>>> On Tue, Oct 12, 2021 at 4:20 PM Reuven Lax <[email protected]> wrote:
>>>>>>
>>>>>>> These are bugs in Python, correct?
>>>>>>>
>>>>>>> On Tue, Oct 12, 2021 at 1:18 PM Steve Niemitz <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> It seems like there's a good amount of incompatibility between java
>>>>>>>> and python wrt beam Rows.  For example the following are unsupported in
>>>>>>>> python (that I've noticed so far)
>>>>>>>> - BYTE
>>>>>>>> - INT16
>>>>>>>> - OneOf
>>>>>>>>
>>>>>>>> Additionally, it seems like nullable fields don't really work
>>>>>>>> correctly, the java BitSetCoder won't encoding trailing empty bytes in 
>>>>>>>> the
>>>>>>>> BitSet, but the python side is expecting every num_fields / 8 bytes to 
>>>>>>>> be
>>>>>>>> present. [1]
>>>>>>>>
>>>>>>>> Certainly these are bugs, but in general it seems to point to a
>>>>>>>> lack of integration testing for xlang interop in general.  I plan on
>>>>>>>> submitting PRs to fix the bugs above (or at least some of them), are 
>>>>>>>> there
>>>>>>>> tests I can change to better exercise these paths?
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/row_coder.py#L198
>>>>>>>>
>>>>>>>

Reply via email to