[Hadoop Wiki] Update of "Hive/JoinOptimization" by Liyi nTang

Apache Wiki Tue, 30 Nov 2010 14:04:23 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "Hive/JoinOptimization" page has been changed by LiyinTang.
http://wiki.apache.org/hadoop/Hive/JoinOptimization?action=diff&rev1=4&rev2=5

--------------------------------------------------

  == 1.1 Using Distributed Cache to Propagate Hashtable File ==
  Previously, when 2 large data tables need to do a join, there will be 2 
different Mappers to sort these tables based on the join key and emit an 
intermediate file, and the Reducer will take the intermediate file as input 
file and do the real join work. This join mechanism is perfect with two large 
data size. But if one the join table is small enough to fit into the Mapper’s 
memory, then there is no need to launch the Reducer. Actually, the Reducer 
stage is very expensive for the performance because the Map/Reduce framework 
needs to sort and merge the intermediate files.
  
- {{attachment:fig1.jpg||height="764px",width="911px"}}
+ {{attachment:fig1.jpg||height="763px",width="909px"}}
  
  '''Fig 1. The Previous Map Join'''
  
  So the basic idea of map join is to hold the data of small table in Mapper’s 
memory and do the join work in Map stage, which saves the Reduce stage. As 
shown in Fig 1, the previous map join operation is not scale for large data 
because each Mapper will directly read the small table data from HDFS. If the 
large data file is large enough, there will be thousands of Mapper launched to 
read different record of this large data file. And thousands of Mapper will 
read this small table data from HDFS into the memory, which makes the small 
table to be the performance bottleneck, or sometimes Mapper will get lots of 
time-out for reading this small file, which may cause the task failed.
  
- {{attachment:fig2.jpg}}
+ {{attachment:fig2.jpg||height="881px",width="1184px"}}
  
  Hive-1641 
([[http://issues.apache.org/jira/browse/HIVE-1293|http://issues.apache.org/jira/browse/HIVE-1641]])
 has solved this problem. As shown in Fig2, the basic idea is to create a new 
task, MapReduce Local Task, before the orginal Join Map/Reduce Task. This new 
task will read the small table data from HDFS to in-memory hashtable. After 
reading, it will serialize the in-memory hashtable into files on disk and 
compress the hashtable file into a tar file. In next stage, when the MapReduce 
task is launching, it will put this tar file to Hadoop Distributed Cache, which 
will populate the tar file to each Mapper’s local disk and decompress the file. 
So all the Mappers can deserialize the hashtable file back into memory and do 
the join work as before.
  
  == 1.2 Removing JDBM ==
- When profiing the Map Join,
+ Previously, Hive uses JDBM 
([[http://issues.apache.org/jira/browse/HIVE-1293|http://jdbm.sourceforge.net/]])
 as a persistent hashtable. Whenever the in-memory hashtable cannot hold data 
any more, it will swap the key/value into the JDBM table. However when profiing 
the Map Join, we found out this JDBM component takes more than 70 % CPU time as 
shown in Fig3. Also the persistent file JDBM genreated is too large to put into 
the Distributed Cache. For example, if users put 67,000 simple interger 
key/value pairs into the JDBM, it will generate more 22M hashtable file. So the 
JDBM is too heavy weight for Map Join and it would better to remove this 
componet from Hive. Map Join is designed for holding the small table's data 
into memory. If the table is too large to hold, just run as a Common Join. 
There is no need to use persistent hashtable any more.
  
  == 1.3 Performance Evaluation ==
  = 2. Converting Join into Map Join dyanmically =

[Hadoop Wiki] Update of "Hive/JoinOptimization" by Liyi nTang

Reply via email to