On Mon, Jul 13, 2020 at 4:43 PM Micah Kornfield <emkornfi...@gmail.com> wrote: >> >> We don't have any test cases that have a future metadata version. I >> made a branch where I added V6 and wrote an IPC message, then found >> that I was unable to determine that it was out of bounds (presumably >> UBSAN would error, though, but we need a runtime error outside of >> ASAN/UBSAN). > > To clarify I don't think UBSAN will error on the existing generated code on > future versions. I believe we had issues with parquet because the enums did > not have an explicit type (compare [1] to [2]) . The version check needs to > be done in our code (comparing against MAX [3]). > > Does that align with your expectations? So we don't get this for free, but > I'm not sure I understand why this is difficult?
If the metadata version comes through as the int16_t value 5 (currently 4 == V5), how do you get to a runtime error? The generated Flatbuffers code is doing a static_cast of 5 to the enum which is UB. Maybe I just don't know what I'm doing. It does not appear to be possible to obtain the raw int16_t value without doing some kind of hacking (e.g. reinterpret_cast of Message* to flatbuffers::Table* and using GetField<int16_t>(VT_VERSION, 0)) I can make a binary file that uses the currently non-existent V6 so you can try to detect it and raise an error > [1] > https://github.com/apache/arrow/blob/master/cpp/src/generated/parquet_types.h#L26 > [2] > https://github.com/apache/arrow/blob/master/cpp/src/generated/Schema_generated.h#L91 > [3] > https://github.com/apache/arrow/blob/master/cpp/src/generated/Schema_generated.h#L109 > > On Mon, Jul 13, 2020 at 2:34 PM Wes McKinney <wesmck...@gmail.com> wrote: >> >> On Mon, Jul 13, 2020 at 4:31 PM Micah Kornfield <emkornfi...@gmail.com> >> wrote: >> > >> > > >> > > >> > > That static cast is currently undefined behavior. >> > >> > Is ubsan reporting this? When looking into the feature enum I tried to >> > understand if that was valid. At the time I read the C++ spec* if the enum >> > has an explicitly declared type, all values in that types range are >> > supported. >> >> We don't have any test cases that have a future metadata version. I >> made a branch where I added V6 and wrote an IPC message, then found >> that I was unable to determine that it was out of bounds (presumably >> UBSAN would error, though, but we need a runtime error outside of >> ASAN/UBSAN). >> >> > The generated enums provide a "max" [1] value that should be comparable >> > against. >> > <https://github.com/apache/arrow/blob/master/cpp/src/generated/Schema_generated.h#L109> >> > >> > >> > * I am not a C++ lawyer >> > >> > [1] >> > https://github.com/apache/arrow/blob/master/cpp/src/generated/Schema_generated.h#L109 >> > >> > On Mon, Jul 13, 2020 at 2:19 PM Wes McKinney <wesmck...@gmail.com> wrote: >> > >> > > I've discovered while working on ARROW-9399 that it is very difficult >> > > with the Flatbuffers API in C++ to detect a MetadataVersion [1] that >> > > is higher than the current version. >> > > >> > > For example, suppose that 3 or 4 years from now we move from version >> > > V5 to version V6. The generated Flatbuffers code looks like this >> > > >> > > org::apache::arrow::flatbuf::MetadataVersion version() const { >> > > return >> > > static_cast<org::apache::arrow::flatbuf::MetadataVersion>(GetField<int16_t>(VT_VERSION, >> > > 0)); >> > > } >> > > >> > > That static cast is currently undefined behavior. >> > > >> > > One way to deal with this would be to add placeholder future versions >> > > (e.g. V6 and V7). >> > > >> > > Another backward-and-forward-compatible option would be to return the >> > > version as a short (int16_t) rather than the enum value, which is >> > > subject to this brittleness. >> > > >> > > Either way unfortunately I think adding forward compatibility checks >> > > is out of scope for 1.0.0 and the risk is low since we don't >> > > anticipate bumping the version anytime soon. >> > > >> > > Thanks, >> > > Wes >> > > >> > > [1]: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22 >> > >