Thanks all, I agree with validating each record batch independently. I made
https://issues.apache.org/jira/browse/ARROW-7966 to ensure this, and that
will hopefully iron out any kinks in the different implementations.

Thanks,
Bryan

On Wed, Feb 26, 2020 at 3:13 PM Wes McKinney <wesmck...@gmail.com> wrote:

> I agree with independent validation.
>
> On Tue, Feb 25, 2020 at 2:55 PM David Li <li.david...@gmail.com> wrote:
> >
> > Hey Bryan,
> >
> > Thanks for looking into this issue. I would vote that we should
> > validate each batch independently, so we can catch issues related to
> > the structure of the data and not just the content. C++ doesn't do any
> > detection of empty batches per se, but on both ends it reads all the
> > data into a table, which would eliminate any empty batches.
> >
> > It also wouldn't be reasonable to stop sending batches that are empty,
> > because Flight lets you attach metadata to batches, and so an empty
> > batch might still have metadata that the client or server wants.
> >
> > Best,
> > David
> >
> > On 2/24/20, Bryan Cutler <cutl...@gmail.com> wrote:
> > > While looking into Null type testing for ARROW-7899, a couple small
> issues
> > > came up regarding Flight integration testing with empty batches (row
> count
> > > == 0) that could be worked out with a quick discussion. It seems there
> is a
> > > small difference between the C++ and Java Flight servers when there are
> > > empty record batches at the end of a stream, more details in PR
> > > https://github.com/apache/arrow/pull/6476.
> > >
> > > The Java server sends all record batches, even the empty ones, and the
> test
> > > client verifies each of these batches matches the batches read from a
> JSON
> > > file. The C++ servers seems to recognize if the end of the stream is
> only
> > > empty batches (please correct me if I'm wrong) and will not serve them.
> > > This seems reasonable, as there is no more actual data left in the
> stream.
> > > The C++ test client reads all batches into a table, does the same for
> the
> > > JSON file, and compares final Tables. I also noticed that empty
> batches in
> > > the middle of the stream will be served.  My questions are:
> > >
> > > 1) What is the expected behavior of a Flight server for empty record
> > > batches, can they be ignored and not sent to the Client?
> > >
> > > 2) Is it good enough to test against a final concatenation of all
> batches
> > > in the stream or should each batch be verified individually to ensure
> the
> > > server is sending out correctly batched data?
> > >
> > > Thanks,
> > > Bryan
> > >
>

Reply via email to