[
https://issues.apache.org/jira/browse/DRILL-8526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986997#comment-17986997
]
ASF GitHub Bot commented on DRILL-8526:
---------------------------------------
cgivre commented on PR #2995:
URL: https://github.com/apache/drill/pull/2995#issuecomment-3019960154
> > > I submitted some minor changes. Just a reminder but we really need
unit tests in order to merge this.
> > > Also, have you considered adding a limit pushdown? It is usually
pretty easy to do and only involves:
> > >
> > > * Implementing two methods in the group scan (`HiveScan`) which are:
`supportsLimitPushdown` and `applyLimit`.
> > > * Passing the limit through the subscans.
> > > * Adding some logic in the readers to stop when the limit is reached.
> > > Maybe it would be best to open a new JIRA for that, but IMHO, it is
one of the easiest and most effective pushdowns that can be implemented yet
Drill didn't seem to do for all the plugins.
> >
> >
> > @cgivre ok, I will add some unit tests and support limit pushdown
@shfshihuafeng Just to be clear, we can't merge this without unit tests.
You don't have to do the limit pushdown if you don't want to or aren't able
to, but I just wanted to recommend it as it can be a big performance benefit
for not much work. Alternatively, you can create a separate pull request for
that and do it later. It is your choice..
> Hive Predicate Push Down for ORC and Parquet
> --------------------------------------------
>
> Key: DRILL-8526
> URL: https://issues.apache.org/jira/browse/DRILL-8526
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Hive
> Affects Versions: 1.22.0
> Reporter: shihuafeng
> Priority: Major
> Fix For: 1.23.0
>
> Attachments: image-2025-06-24-18-08-34-427.png,
> image-2025-06-24-18-08-54-768.png
>
>
> Drill do not support filter push down for orc format. i do it and test.
> When a large amount of data is filtered out, Predicate PushDown can
> significantly improve the query performance of ORC format
> Through comparative testing of the following TPCH SQL queries, ORC format
> with filter pushdown achieves nearly a 5-20x performance improvement over
> execution without pushdown.
> sql : select * from hive.lineitem_o where L_ORDERKEY=1;
> the data of table lineitem_o: 6001215
> with out push down
> !image-2025-06-24-18-08-34-427.png!
> push down
> !image-2025-06-24-18-08-54-768.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)