[
https://issues.apache.org/jira/browse/HAWQ-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ed Espino closed HAWQ-886.
--------------------------
> Investigation of HAWQ/PXF support for ORC
> -----------------------------------------
>
> Key: HAWQ-886
> URL: https://issues.apache.org/jira/browse/HAWQ-886
> Project: Apache HAWQ
> Issue Type: New Feature
> Components: PXF
> Reporter: Shivram Mani
> Assignee: Shivram Mani
> Fix For: 2.1.0.0-incubating
>
>
> Currently HAWQ when reading ORC files via PXF (using the default Hive
> profile) doesn’t push down any of the filter information down to the
> underlying ORC reader. The only filter that is possible right now is at the
> level of partition and is generically done for all Hive tables.
> ORC internally contains file level, stripe level and row level statistics
> including information such as min,max values etc. For more information refer
> to https://orc.apache.org/docs/indexes.html
> The proposal here is to introduce a new PXF profile optimized for ORC files
> which leverages these stats to improve the performance of HAWQ queries with
> predicates. We will also use the Vectorized approach (VectorizedRowBatch)
> while reading along with SearchArgument to build the filter as opposed to the
> existing expensive reader which is row based.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)