[ 
https://issues.apache.org/jira/browse/HADOOP-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615970#action_12615970
 ] 

eric baldeschwieler commented on HADOOP-3601:
---------------------------------------------

After having dealt with the issues of HBASE in contrib, I really like the 
sandbox approach.  It addresses the many and repeated challenges we experienced 
with HBASE.

The idea I think is that all or at least most contrib projects would go there.  
We would get them off the primary lists and would have a clearer separation on 
hudson, mail etc.  We could call the sub-project contrib or commons or 
incubator whatever to make it clear that it is a place for nacent sub-projects, 
that are not part of the Hadoop core code.  Its hard for me to understand why 
we would check in complete systems built ontop of hadoop, like Hive, into core.

Without some process changes like sandbox I'm against bring hive into contrib, 
since it will add overhead to core hadoop work.  But I really want us to find a 
way to encourage this and many more projects that build on Hadoop to share 
their work with the community.

The argument that we should put Hive in contrib because it is easier than going 
to source forge or google code really alarms me!  Starting a project on those 
sites is trivial and requires a lot less commitment than signing up to be a 
good member of the apache hadoop community.  Separating the mailing lists and 
builds of contrib from core would reduce the impact of such projects on the 
core hadoop community substantially, but not to zero.  You are signing up for a 
lot more by putting your project here than these other sites!

See:
- http://incubator.apache.org/learn/theapacheway.html
- http://wiki.apache.org/hadoop/HowToContribute



> Hive as a contrib project
> -------------------------
>
>                 Key: HADOOP-3601
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3601
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.17.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Minor
>         Attachments: HiveTutorial.pdf
>
>   Original Estimate: 1080h
>  Remaining Estimate: 1080h
>
> Hive is a data warehouse built on top of flat files (stored primarily in 
> HDFS). It includes:
> - Data Organization into Tables with logical and hash partitioning
> - A Metastore to store metadata about Tables/Partitions etc
> - A SQL like query language over object data stored in Tables
> - DDL commands to define and load external data into tables
> Hive's query language is executed using Hadoop map-reduce as the execution 
> engine. Queries can use either single stage or multi-stage map-reduce. Hive 
> has a native format for tables - but can handle any data set (for example 
> json/thrift/xml) using an IO library framework.
> Hive uses Antlr for query parsing, Apache JEXL for expression evaluation and 
> may use Apache Derby as an embedded database for MetaStore. Antlr has a BSD 
> license and should be compatible with Apache license.
> We are currently thinking of contributing to the 0.17 branch as a contrib 
> project (since that is the version under which it will get tested internally) 
> - but looking for advice on the best release path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to