If it's not in the spec it's not Beam, because any alternative is Anti Portability ;)
On Tue, Oct 12, 2021, 1:45 PM Reuven Lax <[email protected]> wrote: > Row just uses the existing Java BitSetCoder, which predates the writing of > that spec :) > > On Tue, Oct 12, 2021 at 1:42 PM Robert Burke <[email protected]> wrote: > >> The null fields bitset encoder is defines in the pipeline runner proto >> here: >> https://github.com/apache/beam/blob/4b11efdf96ea4a471e078ec49906c40ef033aafb/model/pipeline/src/main/proto/beam_runner_api.proto#L976 >> >> Per my reading of the spec, the bit set must include the ceiling of >> num_fields/8 bytes, as it doesn't say "trailing bytes for non-nil in fields >> may be dropped". However it might be interpreted as that by the that an >> empty byte array indicating no nils. This is what go implements in the >> coder.WriteRowHeader and coder.ReadRowHeader functions. >> >> But that strikes me as a special case for fully populated rows, not a >> natural extension of a poorly phrased general rule. >> >> On Tue, Oct 12, 2021, 1:31 PM Reuven Lax <[email protected]> wrote: >> >>> Do you think that BitSetCoder is incorrect? >>> >>> On Tue, Oct 12, 2021 at 1:27 PM Steve Niemitz <[email protected]> >>> wrote: >>> >>>> Yeah I believe they're all bugs/missing features in the python >>>> implementation. The nullable BitSet one is arguably a bug in the java >>>> implementation, but since there's no low-level spec on how Rows are >>>> actually encoded it's hard to say who's right. I think Go might have the >>>> same bug there, in which case that's two languages doing it "wrong" and one >>>> doing it "right". :P >>>> >>>> On Tue, Oct 12, 2021 at 4:20 PM Reuven Lax <[email protected]> wrote: >>>> >>>>> These are bugs in Python, correct? >>>>> >>>>> On Tue, Oct 12, 2021 at 1:18 PM Steve Niemitz <[email protected]> >>>>> wrote: >>>>> >>>>>> It seems like there's a good amount of incompatibility between java >>>>>> and python wrt beam Rows. For example the following are unsupported in >>>>>> python (that I've noticed so far) >>>>>> - BYTE >>>>>> - INT16 >>>>>> - OneOf >>>>>> >>>>>> Additionally, it seems like nullable fields don't really work >>>>>> correctly, the java BitSetCoder won't encoding trailing empty bytes in >>>>>> the >>>>>> BitSet, but the python side is expecting every num_fields / 8 bytes to be >>>>>> present. [1] >>>>>> >>>>>> Certainly these are bugs, but in general it seems to point to a lack >>>>>> of integration testing for xlang interop in general. I plan on >>>>>> submitting >>>>>> PRs to fix the bugs above (or at least some of them), are there tests I >>>>>> can >>>>>> change to better exercise these paths? >>>>>> >>>>>> [1] >>>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/row_coder.py#L198 >>>>>> >>>>>
