[
https://issues.apache.org/jira/browse/HAWQ-886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362999#comment-15362999
]
Shivram Mani commented on HAWQ-886:
-----------------------------------
Based on an initial performance evaluation on Reading ORC files with features
turned on/off
Data ~ 500000 rows 6 columns with primitive data types
Read using naive row based reader ~ 1500ms
Read using Vectorizedbatch reader (default 1024 batch size) ~ 1000ms
Read with filter (7500 rows) ~ 750ms
Read without filter with column projection ~ 850ms
Read with filter with column projection ~ 600ms
Over all we can achieve roughly a 60% speedup over a rather small dataset.
> Support PXF filter push down for ORC
> ------------------------------------
>
> Key: HAWQ-886
> URL: https://issues.apache.org/jira/browse/HAWQ-886
> Project: Apache HAWQ
> Issue Type: New Feature
> Components: PXF
> Reporter: Shivram Mani
> Assignee: Shivram Mani
> Fix For: 2.1.0
>
>
> Currently PXF via the Hive profile doesn’t pass any of the filter information
> down while accessing ORC files. The only filter that is possible right now is
> at the level of partition and is generically done for all Hive tables.
> ORC internally contains file level, stripe level and row level statistics
> including information such as min,max values etc. For more information refer
> to https://orc.apache.org/docs/indexes.html
> The proposal here is to possibly introduce a new profile optimized for ORC
> files and to leverage these stats to improve the performance of HAWQ queries
> with predicates.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)