[Hadoop Wiki] Update of "Hive" by EdwardCapriolo

Apache Wiki Tue, 03 Mar 2009 15:18:06 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by EdwardCapriolo:
http://wiki.apache.org/hadoop/Hive

------------------------------------------------------------------------------
  = What Hive is NOT =
  Hive is based on Hadoop which is a batch processing system. Accordingly, this 
system does not and cannot promise low latencies on queries. The paradigm here 
is strictly of submitting jobs and being notified when the jobs are completed 
as opposed to real time queries. As a result it should not be compared with 
systems like Oracle where analysis is done on a significantly smaller amount of 
data but the analysis proceeds much more iteratively with the response times 
between iterations being less than a few minutes. For Hive queries response 
times for even the smallest jobs can be of the order of 5-10 minutes and for 
larger jobs this may even run into hours.
  
- If you input data is small you can execute a query in a short time. For 
example, if a table has 100 rows you can 'set mapred.reduce.tasks=1' and 'set 
mapred.map.tasks=1' and the query time will be ~15 seconds.
+ If your input data is small you can execute a query in a short time. For 
example, if a table has 100 rows you can 'set mapred.reduce.tasks=1' and 'set 
mapred.map.tasks=1' and the query time will be ~15 seconds.
  
  Hive does not mandate read or written data be in "hive format" - there is no 
such thing; Hive works equally well on Thrift, RecordIO, control delimited, or 
your data format.

[Hadoop Wiki] Update of "Hive" by EdwardCapriolo

Reply via email to