[jira] Commented: (HADOOP-3601) Hive as a contrib project

Doug Cutting (JIRA) Wed, 23 Jul 2008 09:06:54 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616137#action_12616137
 ]


Doug Cutting commented on HADOOP-3601:
--------------------------------------

> Yes, a sandbox is still a Hadoop sub-project [ ... ]

Contrib is our de-facto sandbox today.  I am not personally interested in 
setting up and managing a separate sandbox, nor have I yet heard other 
volunteers on the PMC.

I also think we should separate these two issues: how to manage contrib/sandbox 
long-term, and how to import Hive short-term.

> Without some process changes like sandbox I'm against bring hive into 
> contrib, since it will add overhead to core hadoop work.

If it does, then we could move it to a subproject.  As Owen stated, the reason 
such a move was delayed for HBase was that, as a Lucene subproject, Hadoop 
could not create sub-sub-projects.  But now, as a TLP, we can easily and 
quickly create sub-projects when they're needed.  So I don't see this as a 
major liability.

I still think the two viable options are contrib or sub-project.  I don't have 
a strong opinion.  It depends on what activity level we expect.  If it is to be 
relatively low-activity, a contrib module is appropriate.  If it has enough 
activity to support separate mailing lists, releases, etc., then a sub-project 
makes sense.  In either case, we'll first make a guess, then adapt if we've 
made a mistake.  A mistake is not fatal or even critical here, but minor.

> Hive as a contrib project
> -------------------------
>
>                 Key: HADOOP-3601
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3601
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.17.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Minor
>         Attachments: HiveTutorial.pdf
>
>   Original Estimate: 1080h
>  Remaining Estimate: 1080h
>
> Hive is a data warehouse built on top of flat files (stored primarily in 
> HDFS). It includes:
> - Data Organization into Tables with logical and hash partitioning
> - A Metastore to store metadata about Tables/Partitions etc
> - A SQL like query language over object data stored in Tables
> - DDL commands to define and load external data into tables
> Hive's query language is executed using Hadoop map-reduce as the execution 
> engine. Queries can use either single stage or multi-stage map-reduce. Hive 
> has a native format for tables - but can handle any data set (for example 
> json/thrift/xml) using an IO library framework.
> Hive uses Antlr for query parsing, Apache JEXL for expression evaluation and 
> may use Apache Derby as an embedded database for MetaStore. Antlr has a BSD 
> license and should be compatible with Apache license.
> We are currently thinking of contributing to the 0.17 branch as a contrib 
> project (since that is the version under which it will get tested internally) 
> - but looking for advice on the best release path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3601) Hive as a contrib project

Reply via email to