[ 
https://issues.apache.org/jira/browse/HADOOP-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616271#action_12616271
 ] 

Joydeep Sen Sarma commented on HADOOP-3601:
-------------------------------------------

synced up with a few folks working on this internally. in a nutshell - the 
contributors seem to like the idea of making this a contrib project to begin 
with. 

the sub-project requirements (in terms of PMC involvement) are fairly rigorous 
and would probably extend the timeline of releasing hive into the hadoop 
ecosystem. that is our primary concern at this time. as the project matures - 
it's possible/likely that a sub-project designation is more appropriate.

to address the concerns about email  traffic on core-dev - we had a suggestion. 
if we can put the 'component' field in the email header (Pete found this useful 
link: http://www.atlassian.com/software/jira/docs/latest/emailcontent.html) - 
then client-side mail filtering should be able to isolate hive jira traffic 
from that of hadoop (or other contrib projects). there have already been 
suggestions on this thread with not having contrib test failures stop 
acceptance of patches - and that would probably alleviate the other major 
concern around slowing core development down. would these address most of the 
concerns that are motivating the sandbox/sub-project discussion?

i dont think we will see a lot of traffic on core-users mailing list (based on 
the follow up traffic from Ashish's posting of the hive language tutorial) - 
but we will just have to see how that turns out.

> Hive as a contrib project
> -------------------------
>
>                 Key: HADOOP-3601
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3601
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.17.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Minor
>         Attachments: HiveTutorial.pdf
>
>   Original Estimate: 1080h
>  Remaining Estimate: 1080h
>
> Hive is a data warehouse built on top of flat files (stored primarily in 
> HDFS). It includes:
> - Data Organization into Tables with logical and hash partitioning
> - A Metastore to store metadata about Tables/Partitions etc
> - A SQL like query language over object data stored in Tables
> - DDL commands to define and load external data into tables
> Hive's query language is executed using Hadoop map-reduce as the execution 
> engine. Queries can use either single stage or multi-stage map-reduce. Hive 
> has a native format for tables - but can handle any data set (for example 
> json/thrift/xml) using an IO library framework.
> Hive uses Antlr for query parsing, Apache JEXL for expression evaluation and 
> may use Apache Derby as an embedded database for MetaStore. Antlr has a BSD 
> license and should be compatible with Apache license.
> We are currently thinking of contributing to the 0.17 branch as a contrib 
> project (since that is the version under which it will get tested internally) 
> - but looking for advice on the best release path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to