Hi Harsh, I am using CDH3_U0 (0.20.2 hadoop version).
I can't share my code because of company rules, but these are steps which I perform: CASE1: - Use text input format to read content from file - Perform record transformation in mapper - Write output using text output format While running this step I am passing -Ddfs.block.size parameter using Generic Option Parser. In this case everything is working as expected. CASE2: - Use text input format to read content from file - Perform record transformation in mapper - if transformation is successful write output to successful file using multiple outputs - if transformation is failed write output to failed file using multiple outputs In mapper setup method i create instance of MultipleOutputs (MultipleOutputs outputs = new MultipleOuputs(context)). In map method i am calling outputs.write("successful",K,V) or outputs.write("failed", K, V) based on result of transformation logic. I configure multiple outputs using generic option parser -Dmapreduce.inputformat.class=org.apache.hadoop.mapreduce.lib.input.TextInputFormat -Dmapreduce.map.class=MyMapper -Dmapreduce.multipleoutputs="successful failed" -Dmapreduce.multipleoutputs.namedOutput.successful.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat -Dmapreduce.multipleoutputs.namedOutput.successful.key=org.apache.hadoop.io.Text -Dmapreduce.multipleoutputs.namedOutput.successful.value=org.apache.hadoop.io.Text -Dmapreduce.multipleoutputs.namedOutput.failed.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat -Dmapreduce.multipleoutputs.namedOutput.failed.key=org.apache.hadoop.io.Text -Dmapreduce.multipleoutputs.namedOutput.failed.value=org.apache.hadoop.io.Text While running this step I am passing -Ddfs.block.size parameter using Generic Option Parser. Based on block size a lose data in output file. In some cases half of line is missing, in some cases couple of last lines. Also one thing that I have noticed is that file size is always equal <integer>*<block_size>. There is no block which is not fully populated. Hope this helps. thanks, dino On Thu, Aug 18, 2011 at 12:09 PM, Harsh J <ha...@cloudera.com> wrote: > Dino, > > Need some more information: > - Version of Hadoop? > - Do you have a runnable sample test case to reproduce this? Or can > you describe roughly the steps you are performing to create an output? > > FWIW, I ran the trunk's MO tests and those seem to pass for both APIs, > but they do not change dfs.block.size, although I fail to see the > relation between these. > > On Thu, Aug 18, 2011 at 2:00 PM, Dino Kečo <dino.k...@gmail.com> wrote: > > Hi all, > > I have been working on hadoop jobs which are writing output into multiple > > files. In Hadoop API I have found class MultipleOutputs which implement > this > > functionality. > > My use case is to change hdfs block size in one job to increase > parallelism > > and I am doing that using dfs.block.size configuration property. Part of > > output file is missing when I change this property (couple of last lines > in > > some cases half of line is missing). > > I was doing debugging and everything looks fine before calling > outputs.write > > ("sucessfull", KEY, VALUE); > > For output format I am using TextOutputFormat. > > When I remove MultipleOutputs from my code everything is working ok. > > Is there something i am doing wrong or there is issue with multiple > outputs > > ? > > regards, > > dino > > > > > > -- > Harsh J >