Thanks Ted. I just didn't ask it right. Here is a stupid 101 question, which I am sure the answer lies in the documentation somewhere, just that I was having some difficulties in finding it...
when I do an "ls" on the dfs, I would see this: /user/bear/output/part-00000 <r 4> I probably got confused on what the part-##### means... I thought part-##### tells how many splits a file has... so far, I have only seen part-00000. When will it have part-00001, 00002, etc? On Jan 16, 2008 11:04 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > > Parallelizing the processing of data occurs at two steps. The first is > during the map phase where the input data file is (hopefully) split across > multiple tasks. This should happen transparently most of the time unless > you have a perverse data format or use unsplittable compression on your > file. > > This parallelism can occur whether you have one input file or many. > > The second level of parallelism is at reduce phase. You set this by setting > the number of reducers. This will also determine the number of output files > that you get. > > Depending on your algorithm, it may help or hurt to have one or many > reducers. The recent example of a program to find the 10 largest elements > is an example that pretty much requires a single reducer. Other programs > where the mapper produces huge amounts of output would be better served by > having many reducers. > > This is a general answer since the question is kind of non-specific. > > > > On 1/16/08 7:59 AM, "Jim the Standing Bear" <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > How do I make hadoop split its output? The program I am writing > > crawls a catalog tree from a single url, so initially the input > > contains only one entry. after a few iterations, it will have tens of > > thousands of urls. But what I noticed is that the file is always in > > one block (part-00000). What I would like to have is once the number > > of entries increases, it can parallelize the job. Currently it > > doesn't seem to be case. > > -- -------------------------------------- Standing Bear Has Spoken --------------------------------------
