[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

Eric Hanson (JIRA) Mon, 11 Nov 2013 10:16:39 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819170#comment-13819170
 ]


Eric Hanson commented on HIVE-5783:
-----------------------------------

One thing you may want to consider is adding a vectorized InputFormat for 
Parquet that works with the Hive vectorized query execution capability. This 
should allow you to get faster query execution over Parquet on Hive. 
Vectorization dovetails well with columnar storage formats. The vectorization 
code currently supports ORC. But the design of vectorized execution is 
independent of the physical data storage format. The rules for a vectorized 
iterator are described in the section "Vectorized Iterator" in the latest 
design document attached to https://issues.apache.org/jira/browse/HIVE-4160.  
By looking at that section of the design document, and the vectorized iterator 
source code for ORC, you should be able to determine how to add a vectorized 
iterator for Parquet.



> Native Parquet Support in Hive
> ------------------------------
>
>                 Key: HIVE-5783
>                 URL: https://issues.apache.org/jira/browse/HIVE-5783
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Justin Coffey
>            Priority: Minor
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our 
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many 
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a 
> single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

Reply via email to