[
https://issues.apache.org/jira/browse/HADOOP-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616271#action_12616271
]
Joydeep Sen Sarma commented on HADOOP-3601:
-------------------------------------------
synced up with a few folks working on this internally. in a nutshell - the
contributors seem to like the idea of making this a contrib project to begin
with.
the sub-project requirements (in terms of PMC involvement) are fairly rigorous
and would probably extend the timeline of releasing hive into the hadoop
ecosystem. that is our primary concern at this time. as the project matures -
it's possible/likely that a sub-project designation is more appropriate.
to address the concerns about email traffic on core-dev - we had a suggestion.
if we can put the 'component' field in the email header (Pete found this useful
link: http://www.atlassian.com/software/jira/docs/latest/emailcontent.html) -
then client-side mail filtering should be able to isolate hive jira traffic
from that of hadoop (or other contrib projects). there have already been
suggestions on this thread with not having contrib test failures stop
acceptance of patches - and that would probably alleviate the other major
concern around slowing core development down. would these address most of the
concerns that are motivating the sandbox/sub-project discussion?
i dont think we will see a lot of traffic on core-users mailing list (based on
the follow up traffic from Ashish's posting of the hive language tutorial) -
but we will just have to see how that turns out.
> Hive as a contrib project
> -------------------------
>
> Key: HADOOP-3601
> URL: https://issues.apache.org/jira/browse/HADOOP-3601
> Project: Hadoop Core
> Issue Type: New Feature
> Affects Versions: 0.17.0
> Reporter: Joydeep Sen Sarma
> Priority: Minor
> Attachments: HiveTutorial.pdf
>
> Original Estimate: 1080h
> Remaining Estimate: 1080h
>
> Hive is a data warehouse built on top of flat files (stored primarily in
> HDFS). It includes:
> - Data Organization into Tables with logical and hash partitioning
> - A Metastore to store metadata about Tables/Partitions etc
> - A SQL like query language over object data stored in Tables
> - DDL commands to define and load external data into tables
> Hive's query language is executed using Hadoop map-reduce as the execution
> engine. Queries can use either single stage or multi-stage map-reduce. Hive
> has a native format for tables - but can handle any data set (for example
> json/thrift/xml) using an IO library framework.
> Hive uses Antlr for query parsing, Apache JEXL for expression evaluation and
> may use Apache Derby as an embedded database for MetaStore. Antlr has a BSD
> license and should be compatible with Apache license.
> We are currently thinking of contributing to the 0.17 branch as a contrib
> project (since that is the version under which it will get tested internally)
> - but looking for advice on the best release path.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.