Hi Folks, We recently opened up a JIRA in order to bring Hive into the open source fold with the aim of contributing back to hadoop - which has really made large scale data processing so much easier for us at Facebook. We have also uploaded a small tutorial as part of that JIRA that gives a flavor of what kind of capabilities the system has. We would love to get feedback on this, so please check out the described functionality and post any comments, criticisms, wish lists etc. on the JIRA at
https://issues.apache.org/jira/browse/HADOOP-3601 We are planning on an initial release of hive as a contrib project in 0.19 version of hadoop and are really excited about the open source possibilities that it can enable, specially in the data warehousing/ETL space. So please stay tunned to the JIRA for future updates on Hive. Thanks, Ashish for [EMAIL PROTECTED] -----Original Message----- From: Ashish Thusoo (JIRA) [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 08, 2008 4:15 PM To: Ashish Thusoo Subject: [jira] Updated: (HADOOP-3601) Hive as a contrib project [ https://issues.apache.org/jira/browse/HADOOP-3601?page=com.atlassian.jir a.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HADOOP-3601: ---------------------------------- Attachment: HiveTutorial.pdf Tutorial on the capabilities of Hive. This is a pdf of internal documentation and contains query, dml and ddl examples as well as the overview of the system. A formal language spec, architecture documents and roadmaps will follow. This document gives the initial preview of the system and hopefully will seed a lot of interesting discussion/questions etc. around this system. > Hive as a contrib project > ------------------------- > > Key: HADOOP-3601 > URL: https://issues.apache.org/jira/browse/HADOOP-3601 > Project: Hadoop Core > Issue Type: New Feature > Affects Versions: 0.17.0 > Reporter: Joydeep Sen Sarma > Priority: Minor > Attachments: HiveTutorial.pdf > > Original Estimate: 1080h > Remaining Estimate: 1080h > > Hive is a data warehouse built on top of flat files (stored primarily in HDFS). It includes: > - Data Organization into Tables with logical and hash partitioning > - A Metastore to store metadata about Tables/Partitions etc > - A SQL like query language over object data stored in Tables > - DDL commands to define and load external data into tables Hive's > query language is executed using Hadoop map-reduce as the execution engine. Queries can use either single stage or multi-stage map-reduce. Hive has a native format for tables - but can handle any data set (for example json/thrift/xml) using an IO library framework. > Hive uses Antlr for query parsing, Apache JEXL for expression evaluation and may use Apache Derby as an embedded database for MetaStore. Antlr has a BSD license and should be compatible with Apache license. > We are currently thinking of contributing to the 0.17 branch as a contrib project (since that is the version under which it will get tested internally) - but looking for advice on the best release path. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
