[ 
https://issues.apache.org/jira/browse/HAWQ-886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362999#comment-15362999
 ] 

Shivram Mani commented on HAWQ-886:
-----------------------------------

Based on an initial performance evaluation on Reading ORC files with features 
turned on/off
Data ~ 500000 rows 6 columns with primitive data types

Read using naive row based reader ~ 1500ms
Read using Vectorizedbatch reader (default 1024 batch size) ~ 1000ms
Read with filter (7500 rows) ~ 750ms
Read without filter with column projection ~ 850ms
Read with filter with column projection ~ 600ms

Over all we can achieve roughly a 60% speedup over a rather small dataset.

> Support PXF filter push down for ORC
> ------------------------------------
>
>                 Key: HAWQ-886
>                 URL: https://issues.apache.org/jira/browse/HAWQ-886
>             Project: Apache HAWQ
>          Issue Type: New Feature
>          Components: PXF
>            Reporter: Shivram Mani
>            Assignee: Shivram Mani
>             Fix For: 2.1.0
>
>
> Currently PXF via the Hive profile doesn’t pass any of the filter information 
> down while accessing ORC files. The only filter that is possible right now is 
> at the level of partition and is generically done for all Hive tables.
> ORC internally contains file level, stripe level and row level statistics 
> including information such as min,max values etc. For more information refer 
> to https://orc.apache.org/docs/indexes.html
> The proposal here is to possibly introduce a new profile optimized for ORC 
> files and to leverage these stats to improve the performance of HAWQ queries 
> with predicates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to