Hi Tariq, Is your file splittable? If it's not, Mapper will process entire file in one go! http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html#isSplitable%28org.apache.hadoop.mapreduce.JobContext,%20org.apache.hadoop.fs.Path%29
How many mappers being created? See if that helps. Regards, Alok On Thu, Aug 2, 2012 at 3:48 PM, Mohammad Tariq <donta...@gmail.com> wrote: > Thanks for the response Harsh n Sri. Actually, I was trying to prepare > a template for my application using which I was trying to read one > line at a time, extract the first field from it and emit that > extracted value from the mapper. I have these few lines of code for > that : > > public static class XPTMapper extends Mapper<IntWritable, Text, > LongWritable, Text>{ > > public void map(LongWritable key, Text value, Context context) > throws IOException, InterruptedException{ > > Text word = new Text(); > String line = value.toString(); > if (!line.startsWith("TT")){ > context.setStatus("INVALID LINE..SKIPPING........"); > }else{ > String stdid = line.substring(0, 7); > word.set(stdid); > context.write(key, word); > } > } > > But the output file contains all the rows of the input file including > the lines which I was expecting to get skipped. Also, I was expecting > only the fields I am emitting but the file contains entire lines. > Could you guys please point out the the mistake I might have made. > (Pardon my ignorance, as I am not very good at MapReduce).Many thanks. > > Regards, > Mohammad Tariq > > > On Thu, Aug 2, 2012 at 10:58 AM, Sriram Ramachandrasekaran > <sri.ram...@gmail.com> wrote: >> Wouldn't it be better if you could skip those unwanted lines >> upfront(preprocess) and have a file which is ready to be processed by the MR >> system? In any case, more details are needed. >> >> >> On Thu, Aug 2, 2012 at 8:23 AM, Harsh J <ha...@cloudera.com> wrote: >>> >>> Mohammad, >>> >>> > But it seems I am not doing things in correct way. Need some guidance. >>> >>> What do you mean by the above? What is your written code exactly >>> expected to do and what is it not doing? Perhaps since you ask for a >>> code question here, can you share it with us (pastebin or gists, >>> etc.)? >>> >>> For skipping 8 lines, if you are using splits, you need to detect >>> within the mapper or your record reader if the map task filesplit has >>> an offset of 0 and skip 8 line reads if so (Cause its the first split >>> of some file). >>> >>> On Thu, Aug 2, 2012 at 1:54 AM, Mohammad Tariq <donta...@gmail.com> wrote: >>> > Hello list, >>> > >>> > I have a flat file in which data is stored as lines of 107 >>> > bytes each. I need to skip the first 8 lines(as they don't contain any >>> > valuable info). Thereafter, I have to read each line and extract the >>> > information from them, but not the line as a whole. Each line is >>> > composed of several fields without any delimiter between them. For >>> > example, the first field is of 8 bytes, second of 2 bytes and so on. I >>> > was trying to reach each line as a Text value, convert it into string >>> > and using String.subring() method to extract the value of each field. >>> > But it seems I am not doing things in correct way. Need some >>> > guidance. Many thanks. >>> > >>> > Regards, >>> > Mohammad Tariq >>> >>> >>> >>> -- >>> Harsh J >> >> >> >> >> -- >> It's just about how deep your longing is! >> -- Alok Kumar