On Mon, Jul 13, 2020 at 4:43 PM Micah Kornfield <emkornfi...@gmail.com> wrote:
>>
>> We don't have any test cases that have a future metadata version. I
>> made a branch where I added V6 and wrote an IPC message, then found
>> that I was unable to determine that it was out of bounds (presumably
>> UBSAN would error, though, but we need a runtime error outside of
>> ASAN/UBSAN).
>
> To clarify I don't think UBSAN will error on the existing generated code on 
> future versions. I believe we had issues with parquet because the enums did 
> not have an explicit type (compare [1] to [2]) .  The version check needs to 
> be done in our code (comparing against MAX [3]).
>
> Does that align with your expectations?  So we don't get this for free, but 
> I'm not sure I understand why this is difficult?

If the metadata version comes through as the int16_t value 5
(currently 4 == V5), how do you get to a runtime error? The generated
Flatbuffers code is doing a static_cast of 5 to the enum which is UB.
Maybe I just don't know what I'm doing. It does not appear to be
possible to obtain the raw int16_t value without doing some kind of
hacking (e.g. reinterpret_cast of Message* to flatbuffers::Table* and
using GetField<int16_t>(VT_VERSION, 0))

I can make a binary file that uses the currently non-existent V6 so
you can try to detect it and
raise an error



> [1] 
> https://github.com/apache/arrow/blob/master/cpp/src/generated/parquet_types.h#L26
> [2] 
> https://github.com/apache/arrow/blob/master/cpp/src/generated/Schema_generated.h#L91
> [3] 
> https://github.com/apache/arrow/blob/master/cpp/src/generated/Schema_generated.h#L109
>
> On Mon, Jul 13, 2020 at 2:34 PM Wes McKinney <wesmck...@gmail.com> wrote:
>>
>> On Mon, Jul 13, 2020 at 4:31 PM Micah Kornfield <emkornfi...@gmail.com> 
>> wrote:
>> >
>> > >
>> > >
>> > > That static cast is currently undefined behavior.
>> >
>> > Is ubsan reporting this?  When looking into the feature enum I tried to
>> > understand if that was valid. At the time I read the C++ spec* if the enum
>> > has an explicitly declared type, all values in that types range are
>> > supported.
>>
>> We don't have any test cases that have a future metadata version. I
>> made a branch where I added V6 and wrote an IPC message, then found
>> that I was unable to determine that it was out of bounds (presumably
>> UBSAN would error, though, but we need a runtime error outside of
>> ASAN/UBSAN).
>>
>> > The generated enums provide  a "max" [1] value that should be comparable
>> > against.
>> > <https://github.com/apache/arrow/blob/master/cpp/src/generated/Schema_generated.h#L109>
>> >
>> >
>> > * I am not a C++ lawyer
>> >
>> > [1]
>> > https://github.com/apache/arrow/blob/master/cpp/src/generated/Schema_generated.h#L109
>> >
>> > On Mon, Jul 13, 2020 at 2:19 PM Wes McKinney <wesmck...@gmail.com> wrote:
>> >
>> > > I've discovered while working on ARROW-9399 that it is very difficult
>> > > with the Flatbuffers API in C++ to detect a MetadataVersion [1] that
>> > > is higher than the current version.
>> > >
>> > > For example, suppose that 3 or 4 years from now we move from version
>> > > V5 to version V6. The generated Flatbuffers code looks like this
>> > >
>> > > org::apache::arrow::flatbuf::MetadataVersion version() const {
>> > >   return
>> > > static_cast<org::apache::arrow::flatbuf::MetadataVersion>(GetField<int16_t>(VT_VERSION,
>> > > 0));
>> > > }
>> > >
>> > > That static cast is currently undefined behavior.
>> > >
>> > > One way to deal with this would be to add placeholder future versions
>> > > (e.g. V6 and V7).
>> > >
>> > > Another backward-and-forward-compatible option would be to return the
>> > > version as a short (int16_t) rather than the enum value, which is
>> > > subject to this brittleness.
>> > >
>> > > Either way unfortunately I think adding forward compatibility checks
>> > > is out of scope for 1.0.0 and the risk is low since we don't
>> > > anticipate bumping the version anytime soon.
>> > >
>> > > Thanks,
>> > > Wes
>> > >
>> > > [1]: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22
>> > >

Reply via email to