To help with the discussion, here is a patch with 9 "definitely required" fields made required, and the associated generated C++ changes
https://github.com/apache/arrow/compare/master...wesm:flatbuffers-required (I am not 100% sure about Field.children always being non-null, if there were some doubt we could let it be null) (I would guess that the semantics in Java and elsewhere is the same, but someone should confirm) On Mon, Jan 20, 2020 at 12:59 PM Wes McKinney <wesmck...@gmail.com> wrote: > > On Mon, Jan 20, 2020 at 12:20 PM Jacques Nadeau <jacq...@apache.org> wrote: > > > > > > > > I think what we have determined is that the changes that are being > > > discussed in this thread would not render any existing serialized > > > Flatbuffers unreadable, unless they are malformed / unable to be > > > read with the current libraries. > > > > > > > I think we need to separate two different things: > > > > Point 1: If all data is populated as we expect, changing from optional to > > required is a noop. > > Point 2: All current Arrow code fails to work in all cases where a field is > > not populated as expected. > > I looked at the before/after when adding "(required)" to a field and > it appears the only change on the read path is the generated verifier > (which you have to explicitly invoke, and you can skip verification) > > https://gist.github.com/wesm/f1a9e7492b0daee07ccef0566c3900a2 > > This is distinct from Protobuf (I think?) because protobuf verifies > the presence of required fields when parsing the protobuf. I assume > it's the same in other languages but we'll have to check to be sure > > This means that if you _fail to invoke the verifier_, you can still > follow a null pointer, but applications that use the verifier will > stop there and not have to implement their own null checks. > > > > > I think one needs to prove both points in order for this change to be a > > compatible change. I agree that point 1 is proven. I don't think point 2 > > has been proven. In fact, I'm not sure how one could prove it(*). The bar > > for changing the format in a backwards incompatible way (assuming we can't > > prove point 2) should be high given how long the specification has been > > out. It doesn't feel like the benefits here outweigh the cost of changing > > in an incompatible way (especially given the subjective nature of optional > > vs. required). > > > > It's probably less of a concern for > > > an in-house protocol than for an open standard like Arrow where there > > > may be multiple third-party implementations around at some point. > > > > > > > This is subjective, just like the general argument around whether required > > or optional should be used in protobuf. My point in sharing was to (1) > > point out that the initial implementation choices weren't done without > > reason and (2) that we should avoid arguing that either direction is more > > technically sound (which seemed to be the direction the argument was > > taking). > > > > (*) One could do an exhaustive analysis of every codepath. This would work > > for libraries in the Arrow project. However, the flatbuf definition is part > > of the external specification meaning that other codepaths likely exist > > that we could not evaluate.