Thanks for the info! > Not sure what happens if you write NULL as key or value.
Looking at the code, it doesn't seem to really make a difference, and the function in question (basically 'collect') looks to be robust to null - but I may be missing something! In my case, I basically want the key to be the output filename, and the data in the files to be directly consumable by my app. Having the key show up in the file complicates things on the app side so I'm trying to avoid this. Passing null seems to work for now. -lincoln -- lincolnritter.com On Tue, Jul 29, 2008 at 9:27 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote: > On Thu, Jul 24, 2008 at 12:32 AM, Lincoln Ritter > <[EMAIL PROTECTED]> wrote: > >> Alejandro said: >>> Take a look at the MultipleOutputFormat class or MultipleOutputs (in SVN >>> tip) >> >> I'm muddling through both >> http://issues.apache.org/jira/browse/HADOOP-2906 and >> http://issues.apache.org/jira/browse/HADOOP-3149 trying to make sense >> of these. I'm a little confused by the way this works but it looks >> like I can define a number of named outputs which looks like it >> enables different output formats and I can also define some of these >> as "multi", meaning that I can write to different "targets" (like >> files). Is this correct? > > Exactly. > > .... > >> A couple of questions: >> >> - I needed to pass 'null' to the collect method so as to not write >> the key to the file. These files are meant to be consumable chunks of >> content so I want to control exactly what goes into them. Does this >> seem normal or am i missing something? Is there a downside to passing >> null here? > > Not sure what happens if you write NULL as key or value. > >> - What is the 'part-00000' file for? I have seen this in other >> places in the dfs. But it seems extraneous here. It's not super >> critical but if I can make it go away that would be great. > > This is the standard output of the M/R job whatever is written the > OutputCollector you get in the reduce() call (or in the map() call > when reduce=0) > >> - What is the purpose of the '-r-00000' suffix? Perhaps it is to >> help with collisions? > > Yes, files written from a map have '-m-', files written from a reduce have > '-r-' > >> I guess it seems strange that I can't just say >> "the output file should be called X" and have an output file called X >> appear. > > Well, you need the map, reduce mask and the task number mask to avoid > collisions. >
