[
https://issues.apache.org/jira/browse/HADOOP-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616179#action_12616179
]
Doug Cutting commented on HADOOP-3601:
--------------------------------------
> Initially the project will be very active.
Sounds like a sub-project might be called for.
> can someone post details on how sub-projects are organized in terms of source
> code organization/checkin rules/branching etc. ?
Look at HBase and Zookeeper for examples. A sub-project has it's own trunk,
branches and tags in subversion, and releases separately. It has its own
mailing lists and jira instance. It has a separate list of committers. All
Hadoop subprojects are overseen by the Hadoop PMC. For this to be effective,
each subproject should have several PMC members who are active on it, ideally
three or more. Creation of a new sub-project requires a vote of the PMC and
should be discussed on [EMAIL PROTECTED], while creation of a contrib module is
generally handled much like any other patch.
If Hive were a sub-project, it would probably include Hadoop Core jars in it's
lib/ directory. Hive releases would lag Hadoop Core releases.
> Hive as a contrib project
> -------------------------
>
> Key: HADOOP-3601
> URL: https://issues.apache.org/jira/browse/HADOOP-3601
> Project: Hadoop Core
> Issue Type: New Feature
> Affects Versions: 0.17.0
> Reporter: Joydeep Sen Sarma
> Priority: Minor
> Attachments: HiveTutorial.pdf
>
> Original Estimate: 1080h
> Remaining Estimate: 1080h
>
> Hive is a data warehouse built on top of flat files (stored primarily in
> HDFS). It includes:
> - Data Organization into Tables with logical and hash partitioning
> - A Metastore to store metadata about Tables/Partitions etc
> - A SQL like query language over object data stored in Tables
> - DDL commands to define and load external data into tables
> Hive's query language is executed using Hadoop map-reduce as the execution
> engine. Queries can use either single stage or multi-stage map-reduce. Hive
> has a native format for tables - but can handle any data set (for example
> json/thrift/xml) using an IO library framework.
> Hive uses Antlr for query parsing, Apache JEXL for expression evaluation and
> may use Apache Derby as an embedded database for MetaStore. Antlr has a BSD
> license and should be compatible with Apache license.
> We are currently thinking of contributing to the 0.17 branch as a contrib
> project (since that is the version under which it will get tested internally)
> - but looking for advice on the best release path.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.