Thanks all, I agree with validating each record batch independently. I made https://issues.apache.org/jira/browse/ARROW-7966 to ensure this, and that will hopefully iron out any kinks in the different implementations.
Thanks, Bryan On Wed, Feb 26, 2020 at 3:13 PM Wes McKinney <wesmck...@gmail.com> wrote: > I agree with independent validation. > > On Tue, Feb 25, 2020 at 2:55 PM David Li <li.david...@gmail.com> wrote: > > > > Hey Bryan, > > > > Thanks for looking into this issue. I would vote that we should > > validate each batch independently, so we can catch issues related to > > the structure of the data and not just the content. C++ doesn't do any > > detection of empty batches per se, but on both ends it reads all the > > data into a table, which would eliminate any empty batches. > > > > It also wouldn't be reasonable to stop sending batches that are empty, > > because Flight lets you attach metadata to batches, and so an empty > > batch might still have metadata that the client or server wants. > > > > Best, > > David > > > > On 2/24/20, Bryan Cutler <cutl...@gmail.com> wrote: > > > While looking into Null type testing for ARROW-7899, a couple small > issues > > > came up regarding Flight integration testing with empty batches (row > count > > > == 0) that could be worked out with a quick discussion. It seems there > is a > > > small difference between the C++ and Java Flight servers when there are > > > empty record batches at the end of a stream, more details in PR > > > https://github.com/apache/arrow/pull/6476. > > > > > > The Java server sends all record batches, even the empty ones, and the > test > > > client verifies each of these batches matches the batches read from a > JSON > > > file. The C++ servers seems to recognize if the end of the stream is > only > > > empty batches (please correct me if I'm wrong) and will not serve them. > > > This seems reasonable, as there is no more actual data left in the > stream. > > > The C++ test client reads all batches into a table, does the same for > the > > > JSON file, and compares final Tables. I also noticed that empty > batches in > > > the middle of the stream will be served. My questions are: > > > > > > 1) What is the expected behavior of a Flight server for empty record > > > batches, can they be ignored and not sent to the Client? > > > > > > 2) Is it good enough to test against a final concatenation of all > batches > > > in the stream or should each batch be verified individually to ensure > the > > > server is sending out correctly batched data? > > > > > > Thanks, > > > Bryan > > > >