[
https://issues.apache.org/jira/browse/ARROW-15258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482093#comment-17482093
]
Weston Pace commented on ARROW-15258:
-------------------------------------
I'd like to avoid ScanOptions entirely but I'm not opposed to using
InMemoryDataset.
Filter should not be needed (this is useful when scanning only if we can push
the filter down to reduce the amount of data we read from disk, otherwise a
filter node is sufficient).
Projection should not be needed (this is useful when scanning only if we can
push the projection down to reduce the amount of data we read from disk, e.g.
which columns we want to read from disk. Otherwise a project node is
sufficient).
The only parameter that probably makes sense is batch size.
If you want to use InMemoryDataset then that is one possible implementation.
You can just hide the creation of ScanOptions from the user and create your own
default ScanOptions with the default projection and no filter.
Otherwise you can create a record batch reader from a table and I think we have
examples of how to expose a record batch reader as a generator but you would
need to do your own slicing (for batch size) on top of that.
> [C++] Easy options to create a source node from a table
> -------------------------------------------------------
>
> Key: ARROW-15258
> URL: https://issues.apache.org/jira/browse/ARROW-15258
> Project: Apache Arrow
> Issue Type: Sub-task
> Components: C++
> Reporter: Weston Pace
> Assignee: Vibhatha Lakmal Abeykoon
> Priority: Major
>
> Given a Table there should be a very simple way to create a source node.
> Something like:
> {code}
> std::shared_ptr<Table> table = ...
> ARROW_RETURN_NOT_OK(arrow::compute::MakeExecNode(
> "table", plan, {}, arrow::compute::TableSourceOptions{table.get()}));
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)