[
https://issues.apache.org/jira/browse/ARROW-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422448#comment-17422448
]
Weston Pace edited comment on ARROW-10524 at 9/29/21, 11:46 PM:
----------------------------------------------------------------
I don't really like the flag on the fragment (see
https://github.com/apache/arrow/pull/10913#discussion_r699822754). Pushing
down some computation is ok and we have mechanisms for it. For example,
pushing down a filter is fine and the mechanism is the guarantee.
Pushing down projection is not generally a good idea. For example, consider a
query with an order by where the order key column is removed by the projection.
On the other hand, fragments do need to be able to project/cast from the file
schema to the dataset schema but this is a different problem statement.
For more general computation we are venturing into the realm of a distributed
query engine and not a fragment or file format. As another example, consider
an order by. You can push down the filtering but you have to do a
corresponding merge. That might make sense if all your leaves can handle sort
but if only some of your leaves can handle sort then I don't know if there is
much merit in getting back some batches sorted and others unsorted.
was (Author: westonpace):
I don't really like the flag on the fragment (see
https://github.com/apache/arrow/pull/10913#discussion_r699822754). Pushing
down some computation is ok and we have mechanisms for it. For example,
pushing down a filter is fine and the mechanism is the guarantee.
Pushing down projection is not generally a good idea. For example, consider a
query with an order by where the order key column is removed by the projection.
On the other hand, fragments do need to be able to project/cast from the file
schema to the dataset schema but this is a different story.
For more general computation we are venturing into the realm of a distributed
query engine and not a fragment or file format. As another example, consider
an order by. You can push down the filtering but you have to do a
corresponding merge. That might make sense if all your leaves can handle sort
but if only some of your leaves can handle sort then I don't know if there is
much merit in getting back some batches sorted and others unsorted.
> [C++][Dataset] Add FlightFragment
> ---------------------------------
>
> Key: ARROW-10524
> URL: https://issues.apache.org/jira/browse/ARROW-10524
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Affects Versions: 2.0.0
> Reporter: Ben Kietzman
> Assignee: Ben Kietzman
> Priority: Major
> Labels: dataset
> Fix For: 6.0.0
>
>
> Allow wrapping a flight service as a dataset/fragment
--
This message was sent by Atlassian Jira
(v8.3.4#803005)