>
>
> Actually, it looks like my Mac's version of UBSAN doesn't detect the issue
> at all.  I will try on linux a by EOD.

Actually, the issue was I had alignment checks off.  I verify this works
and it appears there is a UBSan issue with flatbuffers::Verify (I'll try to
see if I can find the issue and make a PR upstream).

On Thu, Aug 15, 2019 at 1:03 AM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> I verified with these changes [1], without backwards compatibility
>> support, UBSAN runs cleanly for IPC tests in C++
>
> Actually, it looks like my Mac's version of UBSAN doesn't detect the issue
> at all.  I will try on linux a by EOD.
>
> On Thu, Aug 15, 2019 at 12:52 AM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
>> +1
>>
>> I verified with these changes [1], without backwards compatibility
>> support, UBSAN runs cleanly for IPC tests in C++
>>
>> Just wanted to clarify:
>>
>>> Additionally with this vote, we want to formally approve the change to
>>> the Arrow "file" format to always write the (new 8-byte) end-of-stream
>>> marker, which enables code that processes Arrow streams to safely read
>>> the file's internal messages as though they were a normal stream.
>>
>> This only allows for reading messages safely, we still aren't
>> guaranteeing dictionary batches occur in the file before they are used,
>> correct?
>>
>> Thanks,
>> Micah
>>
>> [1]
>> https://github.com/emkornfield/arrow/commit/8b8348d8bcf62b50c35ddb4926f3d501b4f7147c
>>
>>
>> On Wed, Aug 14, 2019 at 3:43 PM Wes McKinney <wesmck...@gmail.com> wrote:
>>
>>> hi all,
>>>
>>> As we've been discussing [1], there is a need to introduce 4 bytes of
>>> padding into the preamble of the "encapsulated IPC message" format to
>>> ensure that the Flatbuffers metadata payload begins on an 8-byte
>>> aligned memory offset. The alternative to this would be for Arrow
>>> implementations where alignment is important (e.g. C or C++) to copy
>>> the metadata (which is not always small) into memory when it is
>>> unaligned.
>>>
>>> Micah has proposed to address this by adding a
>>> 4-byte "continuation" value at the beginning of the payload
>>> having the value 0xFFFFFFFF. The reason to do it this way is that
>>> old clients will see an invalid length (what is currently the
>>> first 4 bytes of the message -- a 32-bit little endian signed
>>> integer indicating the metadata length) rather than potentially
>>> crashing on a valid length. We also propose to expand the "end of
>>> stream" marker used in the stream and file format from 4 to 8
>>> bytes. This has the additional effect of aligning the file footer
>>> defined in File.fbs.
>>>
>>> This would be a backwards incompatible protocol change, so older Arrow
>>> libraries would not be able to read these new messages. Maintaining
>>> forward compatibility (reading data produced by older libraries) would
>>> be possible as we can reason that a value other than the continuation
>>> value was produced by an older library (and then validate the
>>> Flatbuffer message of course). Arrow implementations could offer a
>>> backward compatibility mode for the sake of old readers if they desire
>>> (this may also assist with testing).
>>>
>>> Additionally with this vote, we want to formally approve the change to
>>> the Arrow "file" format to always write the (new 8-byte) end-of-stream
>>> marker, which enables code that processes Arrow streams to safely read
>>> the file's internal messages as though they were a normal stream.
>>>
>>> The PR making these changes to the IPC documentation is here
>>>
>>> https://github.com/apache/arrow/pull/4951
>>>
>>> Please vote to accept these changes. This vote will be open for at
>>> least 72 hours
>>>
>>> [ ] +1 Adopt these Arrow protocol changes
>>> [ ] +0
>>> [ ] -1 I disagree because...
>>>
>>> Here is my vote: +1
>>>
>>> Thanks,
>>> Wes
>>>
>>> [1]:
>>> https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E
>>>
>>

Reply via email to