[
https://issues.apache.org/jira/browse/ARROW-11782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291706#comment-17291706
]
Neal Richardson commented on ARROW-11782:
-----------------------------------------
I would love to delete ScanTask from the R bindings. The reason they're exposed
there is to support a (hacky, experimental) attempt to do computations on the
stream of record batches so that it's possible to compute things that we
couldn't do otherwise because we can't hold the whole Table in memory. So
Scanner::ToBatches doesn't work in that case because everything would be
materialized.
What I _really_ want is to be able to essentially pass a function/lambda to
something like ToTable or ToBatches and have that function be applied to every
record batch in the stream. I don't want to manage consuming the
ScanTasks/RecordBatchIterators, I'd prefer to have the C++ library handle that.
(In my current hacky use of ScanTasks, it's actually prohibitively slow because
it has to consume the iterators single-threaded.)
> [GLib][Dataset] Remove bindings for internal classes
> ----------------------------------------------------
>
> Key: ARROW-11782
> URL: https://issues.apache.org/jira/browse/ARROW-11782
> Project: Apache Arrow
> Issue Type: Improvement
> Components: GLib
> Affects Versions: 3.0.0
> Reporter: Ben Kietzman
> Priority: Major
> Fix For: 4.0.0
>
>
> GLib and ruby include bindings for internal classes such as ScanOptions,
> ScanContext, InMemoryScanTask, ScanTask, ... These are probably unnecessary
> and should be removed to present a simpler interface less prone to breakage
> under refactoring of the wrapped classes
> https://github.com/apache/arrow/pull/9532/checks?check_run_id=1974229719#step:8:2071
--
This message was sent by Atlassian Jira
(v8.3.4#803005)