[
https://issues.apache.org/jira/browse/ARROW-15747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552043#comment-17552043
]
Antoine Pitrou commented on ARROW-15747:
----------------------------------------
Use case 2 (heterogenous chunking) is easily addressed by redoing the chunking.
That's what Arrow C++ does when you want to get a RecordBatchReader out of a
Table. I agree with use cases 1 and 3.
There are two ways this could be added to Arrow C++ (and PyArrow):
# return a {{RecordBatchReader}} that would read batches of a single column
# add a facility like {{RecordBatchReader}} but on Arrays
The first approach is easier and perhaps less elegant. Also, the second
approach would allow to implement an export function, which would be a bit
clunky under the first approach.
> [Python] Support C stream interface of single arrays
> ----------------------------------------------------
>
> Key: ARROW-15747
> URL: https://issues.apache.org/jira/browse/ARROW-15747
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, Python
> Reporter: Jorge Leitão
> Priority: Major
>
> It seems that the C stream interface in pyarrow currently requires the array
> to be a StructArray.
> I do not see this constraint in the spec
> (https://arrow.apache.org/docs/format/CStreamInterface.html).
> The error I get when I pass an Int32Array to it (declared on the schema):
> {code:java}
> Invalid: Cannot import schema: ArrowSchema describes non-struct type int32
> {code}
> It would be nice to support everything, like the C data interface.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)