[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

Carl Steinbach (JIRA) Sun, 08 Dec 2013 16:09:28 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842804#comment-13842804
 ]


Carl Steinbach commented on HIVE-5783:
--------------------------------------

[~brocknoland] Up to this point we have reserved first-class support for data 
formats in Hive (i.e. changing the grammar) to formats that are implemented 
natively in the Hive source repository. I think we should maintain this 
convention. There are a couple option available if we feel that it's important 
for users to be able to create Parquet formatted tables using the abbreviated 
syntax:

# Add a format registry feature to Hive that allows admins to register 
third-party SerDe implementations and associate them with a format keyword that 
users can reference in a DDL statement.
# Maintain two copies of the Parquet SerDe implementation -- one in Hive and 
one in the parquet-mr repository -- and backport patches between these 
repositories as necessary. If users want to use the parquet-mr version of the 
SerDe with Hive they may do so by referencing the third-party package name in 
their DDL.

On a side note I think the ticket summary "Native Parquet Support in Hive" is 
misleading. Users who see this description in the release notes will conclude 
that the Parquet SerDe code lives in Hive when the exact opposite is true.

> Native Parquet Support in Hive
> ------------------------------
>
>                 Key: HIVE-5783
>                 URL: https://issues.apache.org/jira/browse/HIVE-5783
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Justin Coffey
>            Assignee: Justin Coffey
>            Priority: Minor
>             Fix For: 0.11.0
>
>         Attachments: HIVE-5783.patch, hive-0.11-parquet.patch
>
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our 
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many 
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a 
> single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

Reply via email to