[Hadoop Wiki] Update of "Hive/JoinOptimization" by Kirk True

Apache Wiki Mon, 27 Dec 2010 12:13:40 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "Hive/JoinOptimization" page has been changed by Kirk True.
The comment on this change is: Fixed what appears to be a copy-and-paste error 
with the Jira links.
http://wiki.apache.org/hadoop/Hive/JoinOptimization?action=diff&rev1=12&rev2=13

--------------------------------------------------

  
  In Fig 1 above, the previous map join implementation does not scale well when 
the larger table is huge because each Mapper will directly read the small table 
data from HDFS. If the larger table is huge, there will be thousands of Mapper 
launched to read different records of the larger table. And those thousands of 
Mappers will read this small table data from HDFS into their memory, which can 
make access to the small table become the performance bottleneck; or, sometimes 
Mappers will get lots of time-outs for reading this small file, which may cause 
the task to fail.
  
- Hive-1641 
([[http://issues.apache.org/jira/browse/HIVE-1293|http://issues.apache.org/jira/browse/HIVE-1641]])
 has solved this problem, as shown in Fig2 below.
+ Hive-1641 
([[http://issues.apache.org/jira/browse/HIVE-1641|http://issues.apache.org/jira/browse/HIVE-1641]])
 has solved this problem, as shown in Fig2 below.
  
  {{attachment:fig2.jpg||height="881px",width="1184px"}}
  
@@ -23, +23 @@

  Obviously, the Local Task is a very memory intensive. So the query processor 
will launch this task in a child jvm, which has the same heap size as the 
Mapper's. Since the Local Task may run out of memory, the query processor will 
measure the memory usage of the local task very carefully. Once the memory 
usage of the Local Task is higher than a threshold number. This Local Task will 
abort itself and tells the user that this table is too large to hold in the 
memory. User can change this threshold by '''''set 
hive.mapjoin.localtask.max.memory.usage = 0.999;'''''
  
  == 1.2 Removing JDBM ==
- Previously, Hive uses JDBM 
([[http://issues.apache.org/jira/browse/HIVE-1293|http://jdbm.sourceforge.net/]])
 as a persistent hashtable. Whenever the in-memory hashtable cannot hold data 
any more, it will swap the key/value into the JDBM table. However when 
profiling the Map Join, we found out this JDBM component takes more than 70 % 
CPU time as shown in Fig3. Also the persistent file JDBM generated is too large 
to put into the Distributed Cache. For example, if users put 67,000 simple 
integer key/value pairs into the JDBM, it will generate more 22M hashtable 
file. So the JDBM is too heavy weight for Map Join and it would better to 
remove this component from Hive. Map Join is designed for holding the small 
table's data into memory. If the table is too large to hold, just run as a 
Common Join. There is no need to use a persistent hashtable any more. Hive-1754 
([[http://issues.apache.org/jira/browse/HIVE-1293|http://issues.apache.org/jira/browse/HIVE-1754]])
+ Previously, Hive uses JDBM 
([[http://issues.apache.org/jira/browse/HIVE-1293|http://jdbm.sourceforge.net/]])
 as a persistent hashtable. Whenever the in-memory hashtable cannot hold data 
any more, it will swap the key/value into the JDBM table. However when 
profiling the Map Join, we found out this JDBM component takes more than 70 % 
CPU time as shown in Fig3. Also the persistent file JDBM generated is too large 
to put into the Distributed Cache. For example, if users put 67,000 simple 
integer key/value pairs into the JDBM, it will generate more 22M hashtable 
file. So the JDBM is too heavy weight for Map Join and it would better to 
remove this component from Hive. Map Join is designed for holding the small 
table's data into memory. If the table is too large to hold, just run as a 
Common Join. There is no need to use a persistent hashtable any more. Hive-1754 
([[http://issues.apache.org/jira/browse/HIVE-1754|http://issues.apache.org/jira/browse/HIVE-1754]])
  
  {{attachment:fig3.jpg}}
  
@@ -42, +42 @@

  == 2.1 New Join Execution Flow ==
  Since map join is faster than the common join, it would be better to run the 
map join whenever possible. Previously, Hive users need to give a hint in the 
query to assign which table the small table is. For example, '''''select 
/*+mapjoin(a)*/ * from src1 x  join src2y on x.key=y.key''''';   It is not a 
good way for user experience and query performance, because sometimes user may 
give a wrong hint and also users may not give any hints. It would be much 
better to convert the Common Join into Map Join without users' hint.
  
- Hive-1642 
([[http://issues.apache.org/jira/browse/HIVE-1293|http://issues.apache.org/jira/browse/HIVE-1642]])
 has solved the problem by converting the Common Join into Map Join 
automatically. For the Map Join, the query processor should know which input 
table the big table is. The other input tables will be recognize as the small 
tables during the execution stage and these tables need to be held in the 
memory. However, in general, the query processor has no idea of input file size 
during compiling time (even with statistics) because some of the table may be 
intermediate tables generated from sub queries. So the query processor can only 
figure out the input file size during the execution time.
+ Hive-1642 
([[http://issues.apache.org/jira/browse/HIVE-1642|http://issues.apache.org/jira/browse/HIVE-1642]])
 has solved the problem by converting the Common Join into Map Join 
automatically. For the Map Join, the query processor should know which input 
table the big table is. The other input tables will be recognize as the small 
tables during the execution stage and these tables need to be held in the 
memory. However, in general, the query processor has no idea of input file size 
during compiling time (even with statistics) because some of the table may be 
intermediate tables generated from sub queries. So the query processor can only 
figure out the input file size during the execution time.
  
  Right now, users need to enable this feature by''''' set 
hive.auto.convert.join = true;'''''

[Hadoop Wiki] Update of "Hive/JoinOptimization" by Kirk True

Reply via email to