a question on number of parallel tasks

Jim the Standing Bear Wed, 16 Jan 2008 08:00:08 -0800

Hi,

How do I make hadoop split its output?  The program I am writing
crawls a catalog tree from a single url, so initially the input
contains only one entry.  after a few iterations, it will have tens of
thousands of urls.  But what I noticed is that the file is always in
one block (part-00000).   What I would like to have is once the number
of entries increases, it can parallelize the job.  Currently it
doesn't seem to be case.


-- 
--------------------------------------
Standing Bear Has Spoken
--------------------------------------

a question on number of parallel tasks

Reply via email to