here is my vote: +1

On Tue, Aug 20, 2019 at 3:33 PM Wes McKinney <wesmck...@gmail.com> wrote:

> We need some more PMC members to look at this vote. I know the issue
> is esoteric but please let us know if you have questions.
>
> On Thu, Aug 15, 2019 at 3:35 AM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
> >
> > >
> > >
> > > Actually, it looks like my Mac's version of UBSAN doesn't detect the
> issue
> > > at all.  I will try on linux a by EOD.
> >
> > Actually, the issue was I had alignment checks off.  I verify this works
> > and it appears there is a UBSan issue with flatbuffers::Verify (I'll try
> to
> > see if I can find the issue and make a PR upstream).
> >
> > On Thu, Aug 15, 2019 at 1:03 AM Micah Kornfield <emkornfi...@gmail.com>
> > wrote:
> >
> > > I verified with these changes [1], without backwards compatibility
> > >> support, UBSAN runs cleanly for IPC tests in C++
> > >
> > > Actually, it looks like my Mac's version of UBSAN doesn't detect the
> issue
> > > at all.  I will try on linux a by EOD.
> > >
> > > On Thu, Aug 15, 2019 at 12:52 AM Micah Kornfield <
> emkornfi...@gmail.com>
> > > wrote:
> > >
> > >> +1
> > >>
> > >> I verified with these changes [1], without backwards compatibility
> > >> support, UBSAN runs cleanly for IPC tests in C++
> > >>
> > >> Just wanted to clarify:
> > >>
> > >>> Additionally with this vote, we want to formally approve the change
> to
> > >>> the Arrow "file" format to always write the (new 8-byte)
> end-of-stream
> > >>> marker, which enables code that processes Arrow streams to safely
> read
> > >>> the file's internal messages as though they were a normal stream.
> > >>
> > >> This only allows for reading messages safely, we still aren't
> > >> guaranteeing dictionary batches occur in the file before they are
> used,
> > >> correct?
> > >>
> > >> Thanks,
> > >> Micah
> > >>
> > >> [1]
> > >>
> https://github.com/emkornfield/arrow/commit/8b8348d8bcf62b50c35ddb4926f3d501b4f7147c
> > >>
> > >>
> > >> On Wed, Aug 14, 2019 at 3:43 PM Wes McKinney <wesmck...@gmail.com>
> wrote:
> > >>
> > >>> hi all,
> > >>>
> > >>> As we've been discussing [1], there is a need to introduce 4 bytes of
> > >>> padding into the preamble of the "encapsulated IPC message" format to
> > >>> ensure that the Flatbuffers metadata payload begins on an 8-byte
> > >>> aligned memory offset. The alternative to this would be for Arrow
> > >>> implementations where alignment is important (e.g. C or C++) to copy
> > >>> the metadata (which is not always small) into memory when it is
> > >>> unaligned.
> > >>>
> > >>> Micah has proposed to address this by adding a
> > >>> 4-byte "continuation" value at the beginning of the payload
> > >>> having the value 0xFFFFFFFF. The reason to do it this way is that
> > >>> old clients will see an invalid length (what is currently the
> > >>> first 4 bytes of the message -- a 32-bit little endian signed
> > >>> integer indicating the metadata length) rather than potentially
> > >>> crashing on a valid length. We also propose to expand the "end of
> > >>> stream" marker used in the stream and file format from 4 to 8
> > >>> bytes. This has the additional effect of aligning the file footer
> > >>> defined in File.fbs.
> > >>>
> > >>> This would be a backwards incompatible protocol change, so older
> Arrow
> > >>> libraries would not be able to read these new messages. Maintaining
> > >>> forward compatibility (reading data produced by older libraries)
> would
> > >>> be possible as we can reason that a value other than the continuation
> > >>> value was produced by an older library (and then validate the
> > >>> Flatbuffer message of course). Arrow implementations could offer a
> > >>> backward compatibility mode for the sake of old readers if they
> desire
> > >>> (this may also assist with testing).
> > >>>
> > >>> Additionally with this vote, we want to formally approve the change
> to
> > >>> the Arrow "file" format to always write the (new 8-byte)
> end-of-stream
> > >>> marker, which enables code that processes Arrow streams to safely
> read
> > >>> the file's internal messages as though they were a normal stream.
> > >>>
> > >>> The PR making these changes to the IPC documentation is here
> > >>>
> > >>> https://github.com/apache/arrow/pull/4951
> > >>>
> > >>> Please vote to accept these changes. This vote will be open for at
> > >>> least 72 hours
> > >>>
> > >>> [ ] +1 Adopt these Arrow protocol changes
> > >>> [ ] +0
> > >>> [ ] -1 I disagree because...
> > >>>
> > >>> Here is my vote: +1
> > >>>
> > >>> Thanks,
> > >>> Wes
> > >>>
> > >>> [1]:
> > >>>
> https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E
> > >>>
> > >>
>

Reply via email to