Hello,
I would like to ask a question regarding the map side join. I am trying to
understand the implementation of it and I would be 
grateful if you could tell me whether there is any I/O cost included. 
In detail,
if we have 2 source files of 3 splits each (so as to ensure the constraints
that is, sorted, partitioned etc.) then during map side join these 2 files
are merged before the map function takes place. 
I am trying to comprehend how this merge is done. If I am not mistaken, each
pair of corresponding splits is merged at a time. That is, first the
splits(1) of both sources are taken into account. 

How? Is this done in a 'on the fly' fashion  (in-memory buffer)? Is there
any file locally created? 

I read the relevant details about the iterators but I wonder about the
memory requirements... If each split need to be in-memory stored so as to
have an iterator over it, then there should be a requirement of memory
space. 

Thank you!


-- 
View this message in context: 
http://www.nabble.com/map-side-join-tp24722077p24722077.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Reply via email to