If it's not in the spec it's not Beam,  because any alternative is Anti
Portability ;)

On Tue, Oct 12, 2021, 1:45 PM Reuven Lax <[email protected]> wrote:

> Row just uses the existing Java BitSetCoder, which predates the writing of
> that spec :)
>
> On Tue, Oct 12, 2021 at 1:42 PM Robert Burke <[email protected]> wrote:
>
>> The null fields bitset encoder is defines in the pipeline runner proto
>> here:
>> https://github.com/apache/beam/blob/4b11efdf96ea4a471e078ec49906c40ef033aafb/model/pipeline/src/main/proto/beam_runner_api.proto#L976
>>
>> Per my reading of the spec, the bit set must include the ceiling of
>> num_fields/8 bytes, as it doesn't say "trailing bytes for non-nil in fields
>> may be dropped". However it might be interpreted as that by the that an
>> empty byte array indicating no nils.  This is what go implements in the
>> coder.WriteRowHeader and coder.ReadRowHeader functions.
>>
>> But that strikes me as a special case for fully populated rows, not a
>> natural extension of a poorly phrased general rule.
>>
>> On Tue, Oct 12, 2021, 1:31 PM Reuven Lax <[email protected]> wrote:
>>
>>> Do you think that BitSetCoder is incorrect?
>>>
>>> On Tue, Oct 12, 2021 at 1:27 PM Steve Niemitz <[email protected]>
>>> wrote:
>>>
>>>> Yeah I believe they're all bugs/missing features in the python
>>>> implementation.  The nullable BitSet one is arguably a bug in the java
>>>> implementation, but since there's no low-level spec on how Rows are
>>>> actually encoded it's hard to say who's right.  I think Go might have the
>>>> same bug there, in which case that's two languages doing it "wrong" and one
>>>> doing it "right". :P
>>>>
>>>> On Tue, Oct 12, 2021 at 4:20 PM Reuven Lax <[email protected]> wrote:
>>>>
>>>>> These are bugs in Python, correct?
>>>>>
>>>>> On Tue, Oct 12, 2021 at 1:18 PM Steve Niemitz <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> It seems like there's a good amount of incompatibility between java
>>>>>> and python wrt beam Rows.  For example the following are unsupported in
>>>>>> python (that I've noticed so far)
>>>>>> - BYTE
>>>>>> - INT16
>>>>>> - OneOf
>>>>>>
>>>>>> Additionally, it seems like nullable fields don't really work
>>>>>> correctly, the java BitSetCoder won't encoding trailing empty bytes in 
>>>>>> the
>>>>>> BitSet, but the python side is expecting every num_fields / 8 bytes to be
>>>>>> present. [1]
>>>>>>
>>>>>> Certainly these are bugs, but in general it seems to point to a lack
>>>>>> of integration testing for xlang interop in general.  I plan on 
>>>>>> submitting
>>>>>> PRs to fix the bugs above (or at least some of them), are there tests I 
>>>>>> can
>>>>>> change to better exercise these paths?
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/row_coder.py#L198
>>>>>>
>>>>>

Reply via email to