shollyman commented on pull request #10603:
URL: https://github.com/apache/arrow/pull/10603#issuecomment-876651126


   The issue is the difference in abstractions.  In other languages, there 
exists a concept of a single master "reader" which is schema aware, and which 
to which we supply a series of independent recordbatches which can then be 
decoded.
   
   In java, the construct looks like like 
[`VectorSchemaRoot`](https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/VectorSchemaRoot.html)
 and `VectorLoader`, where the schema information is held by the root and the 
loader is for managing the incoming set of batches.
   
   In python, we've used the [pyarrow.Table and 
from_batches()](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.from_batches)
 to similar effect, though I see there's also a 
[`pyarrow.ipc.read_record_batch`](https://arrow.apache.org/docs/python/generated/pyarrow.ipc.read_record_batch.html#pyarrow.ipc.read_record_batch)
 which similarly accepts a schema and message.
   
   I'm not finding a similar abstraction in the Go library.   My initial 
thought was, based on the existing WithSchema option, that an ipc.Reader could 
be used as a lightweight reader we instantiate for each batch.  But the 
problems with how to deal with schema behavior seems to indicate a different 
construct is warranted.  Should there be a BatchReader or similar construct 
that can retain a schema, and methods appropriate for passing in messages?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to