[ 
https://issues.apache.org/jira/browse/HADOOP-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615701#action_12615701
 ] 

Joydeep Sen Sarma commented on HADOOP-3601:
-------------------------------------------

Ideally we would like it to be in contrib for the same reasons that Owen 
outlined:

- easy (low setup)
- hadoop api's are not frozen yet - so being part of the tree and having 
regression tests run regularly against hadoop trunk makes it easy for us to 
respond to api changes. For the same reason - we like Doug's idea of running 
contrib tests via Hudson as a separate (nightly job)
- we are not set on being a TLP - just want to get it out there.

The point about swamping the core-dev list with contrib jiras is well taken. 
Would it be possible to have a separate email list for contrib projects (at 
least the high volume ones)? It would benefit the contrib authors as well in 
not having to parse tons of core hadoop jiras.

At this point we have also invested a lot of effort in fitting into the contrib 
source tree model - so the sourceforge model sounds a little daunting (I 
imagine Zookeeper is more or less independent of Hadoop? - but Hive is totally 
intertwined with map-reduce/dfs).

> Hive as a contrib project
> -------------------------
>
>                 Key: HADOOP-3601
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3601
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.17.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Minor
>         Attachments: HiveTutorial.pdf
>
>   Original Estimate: 1080h
>  Remaining Estimate: 1080h
>
> Hive is a data warehouse built on top of flat files (stored primarily in 
> HDFS). It includes:
> - Data Organization into Tables with logical and hash partitioning
> - A Metastore to store metadata about Tables/Partitions etc
> - A SQL like query language over object data stored in Tables
> - DDL commands to define and load external data into tables
> Hive's query language is executed using Hadoop map-reduce as the execution 
> engine. Queries can use either single stage or multi-stage map-reduce. Hive 
> has a native format for tables - but can handle any data set (for example 
> json/thrift/xml) using an IO library framework.
> Hive uses Antlr for query parsing, Apache JEXL for expression evaluation and 
> may use Apache Derby as an embedded database for MetaStore. Antlr has a BSD 
> license and should be compatible with Apache license.
> We are currently thinking of contributing to the 0.17 branch as a contrib 
> project (since that is the version under which it will get tested internally) 
> - but looking for advice on the best release path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to