shollyman commented on pull request #10603: URL: https://github.com/apache/arrow/pull/10603#issuecomment-876651126
The issue is the difference in abstractions. In other languages, there exists a concept of a single master "reader" which is schema aware, and which to which we supply a series of independent recordbatches which can then be decoded. In java, the construct looks like like [`VectorSchemaRoot`](https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/VectorSchemaRoot.html) and `VectorLoader`, where the schema information is held by the root and the loader is for managing the incoming set of batches. In python, we've used the [pyarrow.Table and from_batches()](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.from_batches) to similar effect, though I see there's also a [`pyarrow.ipc.read_record_batch`](https://arrow.apache.org/docs/python/generated/pyarrow.ipc.read_record_batch.html#pyarrow.ipc.read_record_batch) which similarly accepts a schema and message. I'm not finding a similar abstraction in the Go library. My initial thought was, based on the existing WithSchema option, that an ipc.Reader could be used as a lightweight reader we instantiate for each batch. But the problems with how to deal with schema behavior seems to indicate a different construct is warranted. Should there be a BatchReader or similar construct that can retain a schema, and methods appropriate for passing in messages? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
