Benjamin Duffield created ARROW-2307:
----------------------------------------

             Summary: Unable to read arrow stream containing 0 record batches 
using pyarrow
                 Key: ARROW-2307
                 URL: https://issues.apache.org/jira/browse/ARROW-2307
             Project: Apache Arrow
          Issue Type: Bug
          Components: C, Python
    Affects Versions: 0.8.0
            Reporter: Benjamin Duffield


Using java arrow I'm creating an arrow stream, using the stream writer.

 

Sometimes I don't have anything to serialize, and so I don't write any record 
batches. My arrow stream thus consists of just a schema message. 
{code:java}
<SCHEMA>
<EOS [optional]: int32>
{code}

I am able to deserialize this arrow stream correctly using the java stream 
reader, but when reading it with python I instead hit an error
{code}
import pyarrow as pa
# ...
reader = pa.open_stream(stream)
df = reader.read_all().to_pandas()
{code}

produces

{code}
  File "ipc.pxi", line 307, in pyarrow.lib._RecordBatchReader.read_all
  File "error.pxi", line 77, in pyarrow.lib.check_status
ArrowInvalid: Must pass at least one record batch
{code}

i.e. we're hitting the check in 
https://github.com/apache/arrow/blob/apache-arrow-0.8.0/cpp/src/arrow/table.cc#L284

The workaround we're currently using is to always ensure we serialize at least 
one record batch, even if it's empty. However, I think it would be nice to 
either support a stream without record batches or explicitly disallow this and 
then match behaviour in java.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to