[Hadoop Wiki] Trivial Update of "Hive" by RaghothamMurthy

Apache Wiki Mon, 25 Aug 2008 12:42:37 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by RaghothamMurthy:
http://wiki.apache.org/hadoop/Hive

------------------------------------------------------------------------------
  = What is Hive =
  Hive is a data warehouse infrastructure built on top of Hadoop that provides 
tools to enable easy data summarization, adhoc querying and analysis of large 
datasets data stored in Hadoop files. It provides a mechanism to put structure 
on this data and it also provides a simple query language called QL which is 
based on SQL and which enables users familiar with SQL to query this data. At 
the same time, this language also allows traditional map/reduce programmers to 
be able to plug in their custom mappers and reducers to do more sophisticated 
analysis which may not be supported by the built in capabilities of the 
language.
  
- = What is NOT Hive =
+ = What Hive is NOT =
  Hive is based on Hadoop which is a batch processing system. Accordingly, this 
system does not and cannot promise low latencies on queries. The paradigm here 
is strictly of submitting jobs and being notified when the jobs are completed 
as opposed to real time queries. As a result it should not be compared with 
systems like Oracle where analysis is done on a significantly smaller amount of 
data but the analysis proceeds much more iteratively with the response times 
between iterations being less than a few minutes. For Hive queries response 
times for even the smallest jobs can be of the order of 5-10 minutes and for 
larger jobs this may even run into hours.
  
  = Status =

[Hadoop Wiki] Trivial Update of "Hive" by RaghothamMurthy

Reply via email to