Hi, 

I am working on a research project optimizing Join algorithms implemented in 
MapReduce. 

My belief is that Pig currently has three types of Join implemented, the 
Replicated Join, Skewed Join and Merge Join. From my understanding reading the 
documentation, it seems that both Replicated and Merge Join are map side Joins 
and Skewed Join is a reduce side join?

Overall, I have a few questions, 

1. Does replicated Join requires the data sets to be sorted? (I know merge join 
requires sorted datasets)
2. Can anyone point me to the actual implementation of the Map Reduce program 
that is generated by Pig with these three different kinds of joins? Or the code 
that maps Pig to Hadoop Map Reduce Join algorithm?

I found the POMergeJoin, POSkewed Join, but I still couldn't figure out how the 
actual MapReduce implementation would look like?

Thanks

Yunming

Reply via email to