[
https://issues.apache.org/jira/browse/HADOOP-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615673#action_12615673
]
Doug Cutting commented on HADOOP-3601:
--------------------------------------
> Owen: In particular, I don't think we should run the contrib unit tests for
> our patches.
Hmm. We might still run them, but not fail a core patch if a contrib test
fails. Or perhaps run them as a separate job in Hudson. We still want contrib
to build and pass tests, and regular Hudson tests are a good way to achieve
this.
> Eric I'd suggest a project like Hive take either the path ZooKeeper or Pig
> took.
As Owen pointed out, the Pig path (incubator) isn't required here, unless Hive
wants to be a TLP (as Pig did at the time). The Zookeeper path (new Hadoop
subproject) is available. I don't have a strong preference. If Hive is
incorporated as a contrib module and it generates too much mailing list traffic
on core lists, that's a success disaster that we can remedy by promoting it to
a subproject. Or if folks feel confident from the start that it will sustain a
subproject and are willing to create the infrastructure for that, that's fine
too. As Owen mentioned, a subproject takes more time, to create a JIRA
instance, mailing lists, web site, etc, especially if the folks involved are
not already familiar with how these things are done at Apache. But it's not
that hard.
> Hive as a contrib project
> -------------------------
>
> Key: HADOOP-3601
> URL: https://issues.apache.org/jira/browse/HADOOP-3601
> Project: Hadoop Core
> Issue Type: New Feature
> Affects Versions: 0.17.0
> Reporter: Joydeep Sen Sarma
> Priority: Minor
> Attachments: HiveTutorial.pdf
>
> Original Estimate: 1080h
> Remaining Estimate: 1080h
>
> Hive is a data warehouse built on top of flat files (stored primarily in
> HDFS). It includes:
> - Data Organization into Tables with logical and hash partitioning
> - A Metastore to store metadata about Tables/Partitions etc
> - A SQL like query language over object data stored in Tables
> - DDL commands to define and load external data into tables
> Hive's query language is executed using Hadoop map-reduce as the execution
> engine. Queries can use either single stage or multi-stage map-reduce. Hive
> has a native format for tables - but can handle any data set (for example
> json/thrift/xml) using an IO library framework.
> Hive uses Antlr for query parsing, Apache JEXL for expression evaluation and
> may use Apache Derby as an embedded database for MetaStore. Antlr has a BSD
> license and should be compatible with Apache license.
> We are currently thinking of contributing to the 0.17 branch as a contrib
> project (since that is the version under which it will get tested internally)
> - but looking for advice on the best release path.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.