[
https://issues.apache.org/jira/browse/ARROW-10368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Remi Dettai updated ARROW-10368:
--------------------------------
Summary: [Rust][DataFusion] Make InMemoryScan work on iterators of
RecordBatch (was: [Rust][Datafusion] Make InMemoryScan work on iterators of
RecordBatch)
> [Rust][DataFusion] Make InMemoryScan work on iterators of RecordBatch
> ---------------------------------------------------------------------
>
> Key: ARROW-10368
> URL: https://issues.apache.org/jira/browse/ARROW-10368
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Rust, Rust - DataFusion
> Reporter: Remi Dettai
> Priority: Major
>
> Currently, InMemoryScan takes a Vec<Vec<RecordBatch>> as data.
> - the outer Vec separates the partitions
> - the inner Vec contains all the RecordBatch for one partition
> The inner Vec is then converted into an iterator when the LogicalPlan is
> turned into a PhysicalPlan.
> I suggest that InMemoryScan should take Vec<Iter<RecordBatch>>. This would
> make it possible to plug custom Scan implementations into datafusion without
> the need to read them entirely into memory. It would still work pretty
> seamlessly with Vec<Vec<RecordBatch>> that would just need a to be converted
> with data.map(|x| x.iter()) first.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)