Re: how to deal with large amount of key value pair outputs in one run of map task

Owen O'Malley Tue, 21 Aug 2007 10:32:35 -0700


On Aug 20, 2007, at 2:05 PM, Eric Zhang wrote:

Thanks a lot for the response, Arun. Just curious how OutputCollector
flushes key/value pair to disk: is the periodical flush based ontime (likeevery couple of mins) or based on volumn (like every 100 key/valuepair
output).
The size of map output varies for each key/value input, it could beas smallas one key/value pair output or as big as tens of millions of key/valuepairs. I could try to change the way my application works to avoidthisproblem, but I am wondering if the hadoop already supports thescalability
in such case besides increasing memeory?

It uses io.sort.mb, which is the number of megabytes to keep beforeyou sort and spill to disk. (The config variable was named back whenthe sort was being handled very differently, and thus the unobviousname.) A major point of map/reduce is to scale to very large datasets and make very few assumptions about what will fit in memory atonce.


-- Owen

Re: how to deal with large amount of key value pair outputs in one run of map task

Reply via email to