Greetings, We are excited to announce that Amazon Elastic MapReduce now supports Apache Hive – making the service even more compelling for large data set processing and analytics. Hive is an open source data warehouse and analytics package that runs on top of Hadoop. Hive is operated by a SQL-based language called Hive QL that allows users to structure, summarize, and query data sources stored in Amazon S3. Hive QL goes beyond standard SQL, adding first-class support for map/reduce functions and complex extensible user defined data types like Json and Thrift. This capability allows processing of complex and unstructured data sources, such as text documents, and log files, in applications such as data mining or click stream analysis. Hive also allows user extensions via user-defined functions written in Java and deployed via storage in Amazon S3.
Here are some resources to help you get started: - Tutorial: *Running Hive on Amazon ElasticMap Reduce<http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2857> * - Video a video tutorial<http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2862> - Sample application: Operating a Data Warehouse with Hive Amazon Elastic MapReduce and Amazon SimpleDB.<http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2854> * * Sincerely, The Amazon Elastic MapReduce Team
