I think Micah is right.  Also, it seems (from checking the source) that
the Flatbuffers verifier doesn't check that enums are in range, so we
may possibly allow out-of-range values and interpret them as "highest
supported version".

Regards

Antoine.


Le 14/07/2020 à 00:53, Micah Kornfield a écrit :
> To clarify on UBSAN and enums.  My understanding is:
> 
> enum A { a = 1, b =2, c = 3};
> class enum B : int16_t { a = 1, b = 2, c = 3};
> 
> A a = static_cast<A>(4); // UB
> B b = static_cast<B>(4); // Not UB.  Declaring the holding type makes this
> allowable.
> 
> 
> On Mon, Jul 13, 2020 at 3:44 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
> 
>> Please see [1].  I ran this arrow-ipc-read-write-test with UBSAN enabled
>> and it passed (this isn't my normal dev environment so please double check).
>>
>>
>> https://github.com/emkornfield/arrow/commit/7fbd0fb95f7ea164284720428c7974b87b4b2443
>>
>> On Mon, Jul 13, 2020 at 3:12 PM Micah Kornfield <emkornfi...@gmail.com>
>> wrote:
>>
>>> I think this might be more complicated, let me see if i can write a test
>>> that demonstrates what I'm talking about.
>>>
>>> On Mon, Jul 13, 2020 at 3:10 PM Wes McKinney <wesmck...@gmail.com> wrote:
>>>
>>>> Here's a patch that does the check
>>>>
>>>>
>>>> https://github.com/wesm/arrow/commit/5bfdb4255a66a4ec62b1c36ba07682fad47df9a7
>>>>
>>>> Here is a serialized schema that uses a V6 version
>>>>
>>>>
>>>> https://drive.google.com/file/d/1GiWh5yKXdMaLRWU5K4cnGW2ilybF0LF_/view?usp=sharing
>>>>
>>>> See in action
>>>> https://gist.github.com/wesm/f9621a626d56491b0bd6c8a131acf518
>>>>
>>>> This seems hacky to me, but maybe it's OK?
>>>>
>>>> On Mon, Jul 13, 2020 at 4:53 PM Wes McKinney <wesmck...@gmail.com>
>>>> wrote:
>>>>>
>>>>> On Mon, Jul 13, 2020 at 4:43 PM Micah Kornfield <emkornfi...@gmail.com>
>>>> wrote:
>>>>>>>
>>>>>>> We don't have any test cases that have a future metadata version. I
>>>>>>> made a branch where I added V6 and wrote an IPC message, then found
>>>>>>> that I was unable to determine that it was out of bounds (presumably
>>>>>>> UBSAN would error, though, but we need a runtime error outside of
>>>>>>> ASAN/UBSAN).
>>>>>>
>>>>>> To clarify I don't think UBSAN will error on the existing generated
>>>> code on future versions. I believe we had issues with parquet because the
>>>> enums did not have an explicit type (compare [1] to [2]) .  The version
>>>> check needs to be done in our code (comparing against MAX [3]).
>>>>>>
>>>>>> Does that align with your expectations?  So we don't get this for
>>>> free, but I'm not sure I understand why this is difficult?
>>>>>
>>>>> If the metadata version comes through as the int16_t value 5
>>>>> (currently 4 == V5), how do you get to a runtime error? The generated
>>>>> Flatbuffers code is doing a static_cast of 5 to the enum which is UB.
>>>>> Maybe I just don't know what I'm doing. It does not appear to be
>>>>> possible to obtain the raw int16_t value without doing some kind of
>>>>> hacking (e.g. reinterpret_cast of Message* to flatbuffers::Table* and
>>>>> using GetField<int16_t>(VT_VERSION, 0))
>>>>>
>>>>> I can make a binary file that uses the currently non-existent V6 so
>>>>> you can try to detect it and
>>>>> raise an error
>>>>>
>>>>>
>>>>>
>>>>>> [1]
>>>> https://github.com/apache/arrow/blob/master/cpp/src/generated/parquet_types.h#L26
>>>>>> [2]
>>>> https://github.com/apache/arrow/blob/master/cpp/src/generated/Schema_generated.h#L91
>>>>>> [3]
>>>> https://github.com/apache/arrow/blob/master/cpp/src/generated/Schema_generated.h#L109
>>>>>>
>>>>>> On Mon, Jul 13, 2020 at 2:34 PM Wes McKinney <wesmck...@gmail.com>
>>>> wrote:
>>>>>>>
>>>>>>> On Mon, Jul 13, 2020 at 4:31 PM Micah Kornfield <
>>>> emkornfi...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> That static cast is currently undefined behavior.
>>>>>>>>
>>>>>>>> Is ubsan reporting this?  When looking into the feature enum I
>>>> tried to
>>>>>>>> understand if that was valid. At the time I read the C++ spec* if
>>>> the enum
>>>>>>>> has an explicitly declared type, all values in that types range
>>>> are
>>>>>>>> supported.
>>>>>>>
>>>>>>> We don't have any test cases that have a future metadata version. I
>>>>>>> made a branch where I added V6 and wrote an IPC message, then found
>>>>>>> that I was unable to determine that it was out of bounds (presumably
>>>>>>> UBSAN would error, though, but we need a runtime error outside of
>>>>>>> ASAN/UBSAN).
>>>>>>>
>>>>>>>> The generated enums provide  a "max" [1] value that should be
>>>> comparable
>>>>>>>> against.
>>>>>>>> <
>>>> https://github.com/apache/arrow/blob/master/cpp/src/generated/Schema_generated.h#L109
>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> * I am not a C++ lawyer
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>>>> https://github.com/apache/arrow/blob/master/cpp/src/generated/Schema_generated.h#L109
>>>>>>>>
>>>>>>>> On Mon, Jul 13, 2020 at 2:19 PM Wes McKinney <wesmck...@gmail.com>
>>>> wrote:
>>>>>>>>
>>>>>>>>> I've discovered while working on ARROW-9399 that it is very
>>>> difficult
>>>>>>>>> with the Flatbuffers API in C++ to detect a MetadataVersion [1]
>>>> that
>>>>>>>>> is higher than the current version.
>>>>>>>>>
>>>>>>>>> For example, suppose that 3 or 4 years from now we move from
>>>> version
>>>>>>>>> V5 to version V6. The generated Flatbuffers code looks like this
>>>>>>>>>
>>>>>>>>> org::apache::arrow::flatbuf::MetadataVersion version() const {
>>>>>>>>>   return
>>>>>>>>>
>>>> static_cast<org::apache::arrow::flatbuf::MetadataVersion>(GetField<int16_t>(VT_VERSION,
>>>>>>>>> 0));
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> That static cast is currently undefined behavior.
>>>>>>>>>
>>>>>>>>> One way to deal with this would be to add placeholder future
>>>> versions
>>>>>>>>> (e.g. V6 and V7).
>>>>>>>>>
>>>>>>>>> Another backward-and-forward-compatible option would be to
>>>> return the
>>>>>>>>> version as a short (int16_t) rather than the enum value, which
>>>> is
>>>>>>>>> subject to this brittleness.
>>>>>>>>>
>>>>>>>>> Either way unfortunately I think adding forward compatibility
>>>> checks
>>>>>>>>> is out of scope for 1.0.0 and the risk is low since we don't
>>>>>>>>> anticipate bumping the version anytime soon.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Wes
>>>>>>>>>
>>>>>>>>> [1]:
>>>> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22
>>>>>>>>>
>>>>
>>>
> 

Reply via email to