Hi Ashish I am very excited to try this, having been evaluating Hadoop, HBase, Cascading etc recently to process 100 millions of Biodiversity records (expecting billions soon), with a view for data mining purposes (species that are critically endangered and observed outside of protected areas within the last 2 years). All open access to Biodiversity information. It is difficult to comment on the paper, as it looks to offer pretty much most of what I am looking for, but without running it, it's difficult...
If you would like a tester, I would happily fill this role and offer sample code and input files which could go into "getting started" guides on wiki etc. Cheers, Tim On Wed, Jul 9, 2008 at 9:47 AM, Ashish Thusoo <[EMAIL PROTECTED]> wrote: > Hi Folks, > > We recently opened up a JIRA in order to bring Hive into the open source > fold with the aim of contributing back to hadoop - which has really made > large scale data processing so much easier for us at Facebook. We have > also uploaded a small tutorial as part of that JIRA that gives a flavor > of what kind of capabilities the system has. We would love to get > feedback on this, so please check out the described functionality and > post any comments, criticisms, wish lists etc. on the JIRA at > > https://issues.apache.org/jira/browse/HADOOP-3601 > > We are planning on an initial release of hive as a contrib project in > 0.19 version of hadoop and are really excited about the open source > possibilities that it can enable, specially in the data warehousing/ETL > space. So please stay tunned to the JIRA for future updates on Hive. > > Thanks, > Ashish for [EMAIL PROTECTED] > > -----Original Message----- > From: Ashish Thusoo (JIRA) [mailto:[EMAIL PROTECTED] > Sent: Tuesday, July 08, 2008 4:15 PM > To: Ashish Thusoo > Subject: [jira] Updated: (HADOOP-3601) Hive as a contrib project > > > [ > https://issues.apache.org/jira/browse/HADOOP-3601?page=com.atlassian.jir > a.plugin.system.issuetabpanels:all-tabpanel ] > > Ashish Thusoo updated HADOOP-3601: > ---------------------------------- > > Attachment: HiveTutorial.pdf > > Tutorial on the capabilities of Hive. This is a pdf of internal > documentation and contains query, dml and ddl examples as well as the > overview of the system. A formal language spec, architecture documents > and roadmaps will follow. This document gives the initial preview of the > system and hopefully will seed a lot of interesting discussion/questions > etc. around this system. > > > Hive as a contrib project > > ------------------------- > > > > Key: HADOOP-3601 > > URL: https://issues.apache.org/jira/browse/HADOOP-3601 > > Project: Hadoop Core > > Issue Type: New Feature > > Affects Versions: 0.17.0 > > Reporter: Joydeep Sen Sarma > > Priority: Minor > > Attachments: HiveTutorial.pdf > > > > Original Estimate: 1080h > > Remaining Estimate: 1080h > > > > Hive is a data warehouse built on top of flat files (stored primarily > in HDFS). It includes: > > - Data Organization into Tables with logical and hash partitioning > > - A Metastore to store metadata about Tables/Partitions etc > > - A SQL like query language over object data stored in Tables > > - DDL commands to define and load external data into tables Hive's > > query language is executed using Hadoop map-reduce as the execution > engine. Queries can use either single stage or multi-stage map-reduce. > Hive has a native format for tables - but can handle any data set (for > example json/thrift/xml) using an IO library framework. > > Hive uses Antlr for query parsing, Apache JEXL for expression > evaluation and may use Apache Derby as an embedded database for > MetaStore. Antlr has a BSD license and should be compatible with Apache > license. > > We are currently thinking of contributing to the 0.17 branch as a > contrib project (since that is the version under which it will get > tested internally) - but looking for advice on the best release path. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > >
