bkietz commented on pull request #7181:
URL: https://github.com/apache/arrow/pull/7181#issuecomment-636937045


   @emkornfield @wesm In adding a unit test I've become uncertain of the 
`ColumnReader` contract and whether my solution upholds it
   
   - [ColumnReader::NextBatch's 
doccomment](https://github.com/apache/arrow/blob/6716bbd/cpp/src/parquet/arrow/reader.h#L254-L255)
 states that when no data remains null should be yielded (which I read as: a 
null ChunkedArray).
   - Instead the tests [assert that the chunked array contain a single null 
chunk](https://github.com/apache/arrow/blob/6716bbd25ead03ad4774c8d1caa612a8f66e853c/cpp/src/parquet/arrow/arrow_reader_writer_test.cc#L502-L504)
   - When reading into a dictionary `LeafReader` does neither of these and 
instead yields an empty `ChunkedArray` (for which `NestedListReader` on master 
is unprepared, causing ARROW-8799).
   
   If modifying `NestedListReader` as I have here is unsatisfactory, I could 
change `TransferDictionary` to ensure `LeafReader` yields 
`ChunkedArray{nullptr}` when its out of data. What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to