I think for Go, it's mostly needs a change here: https://github.com/apache/beam/blob/4b11efdf96ea4a471e078ec49906c40ef033aafb/sdks/go/pkg/beam/core/graph/coder/row.go#L187
Specifically, returning false when i >= len(nils) (along with the actual bit check for when i < len(nils)). On Tue, Oct 12, 2021, 3:19 PM Robert Bradshaw <[email protected]> wrote: > On Tue, Oct 12, 2021 at 2:04 PM Brian Hulette <[email protected]> wrote: > > > > I wrote that spec, my intention was to document what the Java > implementation was doing. I'm sorry my poor description has gotten us in > this situation. I agree with Reuven that we should avoid changing the Java > implementation for a few reasons: > > - As Reuven pointed out, we support update compatibility for row coder > in Java. This isn't true for Python, Go (is it?) > > We also support update for Python, but if we're going to have to > change one of them (and it's certainly a bug that they don't agree) > Java has a much larger streaming install base. > > Fortunately, this should be backwards compatible: coders need not > reject trailing zeros, but can also be tolerant of them not being > present. Should be an easy fix (any takers?). > > > - As Steve pointed out, it's more efficient. > > - Personally, my intention from the outset was to match the Java > implementation. I'd prefer to fix my mistake. > > > > I can also provide a few pointers for the original questions: > > - Regarding Python support for additional types, see BEAM-7996 [1] > > - Regarding cross-language testing, see tests for beam:coder:row:v1 in > standard_coders.yaml [2] > > > > Brian > > > > [1] https://issues.apache.org/jira/browse/BEAM-7996 > > [2] > https://github.com/apache/beam/blob/master/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml > > > > > > On Tue, Oct 12, 2021 at 1:55 PM Steve Niemitz <[email protected]> > wrote: > >> > >> imo supporting stripping trailing 0 bytes from the bitset encoding is > the way to go. It's both backwards compatible with existing serialized > rows, as well as more space efficient. > >> > >> On Tue, Oct 12, 2021 at 4:52 PM Reuven Lax <[email protected]> wrote: > >>> > >>> Of course. But if we write a spec that doesn't agree with the existing > Java coders, that is also futile. We can't easily change Java coders due to > update compatibility concerns, and we definitely need portability to work > with Java! > >>> > >>> On Tue, Oct 12, 2021 at 1:47 PM Robert Burke <[email protected]> > wrote: > >>>> > >>>> If it's not in the spec it's not Beam, because any alternative is > Anti Portability ;) > >>>> > >>>> On Tue, Oct 12, 2021, 1:45 PM Reuven Lax <[email protected]> wrote: > >>>>> > >>>>> Row just uses the existing Java BitSetCoder, which predates the > writing of that spec :) > >>>>> > >>>>> On Tue, Oct 12, 2021 at 1:42 PM Robert Burke <[email protected]> > wrote: > >>>>>> > >>>>>> The null fields bitset encoder is defines in the pipeline runner > proto here: > https://github.com/apache/beam/blob/4b11efdf96ea4a471e078ec49906c40ef033aafb/model/pipeline/src/main/proto/beam_runner_api.proto#L976 > >>>>>> > >>>>>> Per my reading of the spec, the bit set must include the ceiling of > num_fields/8 bytes, as it doesn't say "trailing bytes for non-nil in fields > may be dropped". However it might be interpreted as that by the that an > empty byte array indicating no nils. This is what go implements in the > coder.WriteRowHeader and coder.ReadRowHeader functions. > >>>>>> > >>>>>> But that strikes me as a special case for fully populated rows, not > a natural extension of a poorly phrased general rule. > >>>>>> > >>>>>> On Tue, Oct 12, 2021, 1:31 PM Reuven Lax <[email protected]> wrote: > >>>>>>> > >>>>>>> Do you think that BitSetCoder is incorrect? > >>>>>>> > >>>>>>> On Tue, Oct 12, 2021 at 1:27 PM Steve Niemitz <[email protected]> > wrote: > >>>>>>>> > >>>>>>>> Yeah I believe they're all bugs/missing features in the python > implementation. The nullable BitSet one is arguably a bug in the java > implementation, but since there's no low-level spec on how Rows are > actually encoded it's hard to say who's right. I think Go might have the > same bug there, in which case that's two languages doing it "wrong" and one > doing it "right". :P > >>>>>>>> > >>>>>>>> On Tue, Oct 12, 2021 at 4:20 PM Reuven Lax <[email protected]> > wrote: > >>>>>>>>> > >>>>>>>>> These are bugs in Python, correct? > >>>>>>>>> > >>>>>>>>> On Tue, Oct 12, 2021 at 1:18 PM Steve Niemitz < > [email protected]> wrote: > >>>>>>>>>> > >>>>>>>>>> It seems like there's a good amount of incompatibility between > java and python wrt beam Rows. For example the following are unsupported > in python (that I've noticed so far) > >>>>>>>>>> - BYTE > >>>>>>>>>> - INT16 > >>>>>>>>>> - OneOf > >>>>>>>>>> > >>>>>>>>>> Additionally, it seems like nullable fields don't really work > correctly, the java BitSetCoder won't encoding trailing empty bytes in the > BitSet, but the python side is expecting every num_fields / 8 bytes to be > present. [1] > >>>>>>>>>> > >>>>>>>>>> Certainly these are bugs, but in general it seems to point to a > lack of integration testing for xlang interop in general. I plan on > submitting PRs to fix the bugs above (or at least some of them), are there > tests I can change to better exercise these paths? > >>>>>>>>>> > >>>>>>>>>> [1] > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/row_coder.py#L198 >
