Benjamin Duffield created ARROW-2307:

             Summary: Unable to read arrow stream containing 0 record batches 
using pyarrow
                 Key: ARROW-2307
             Project: Apache Arrow
          Issue Type: Bug
          Components: C, Python
    Affects Versions: 0.8.0
            Reporter: Benjamin Duffield

Using java arrow I'm creating an arrow stream, using the stream writer.


Sometimes I don't have anything to serialize, and so I don't write any record 
batches. My arrow stream thus consists of just a schema message. 
<EOS [optional]: int32>

I am able to deserialize this arrow stream correctly using the java stream 
reader, but when reading it with python I instead hit an error
import pyarrow as pa
# ...
reader = pa.open_stream(stream)
df = reader.read_all().to_pandas()


  File "ipc.pxi", line 307, in pyarrow.lib._RecordBatchReader.read_all
  File "error.pxi", line 77, in pyarrow.lib.check_status
ArrowInvalid: Must pass at least one record batch

i.e. we're hitting the check in

The workaround we're currently using is to always ensure we serialize at least 
one record batch, even if it's empty. However, I think it would be nice to 
either support a stream without record batches or explicitly disallow this and 
then match behaviour in java.

This message was sent by Atlassian JIRA

Reply via email to