Hi, 

As each row of my hbase table can take a lot of time to process (waiting on
answeres from other hosts), I would like to create a few threads to process
that data in parallel. I would then use the last call to the map function to
wait for all threads to finish their job and only return the last call to
the map function when everything is done and all threads exited.

How do I know when the last row is passed to my mapper function? (I'm
extending TableMap for my mapper as also done in the wiki examples) I didn't
find any function to check this.

Another possibility would be to create more mapper jobs and let the hadoop
framework do the processing in parallel. However I read somewhere that each
mapper get's an entire region. In my case, the data in each row is very
small, so each mapper could get millions of rows (with the default
region/block size). 

What would you do?

Thanks,
Thibaut
-- 
View this message in context: 
http://www.nabble.com/How-to-detect-when-the-mapper-is-called-the-last-time--tp20528861p20528861.html
Sent from the HBase User mailing list archive at Nabble.com.

Reply via email to