Harsh, that was exactly the issue! Thanks very much for your help Tim
On 19 August 2011 15:15, Harsh J <ha...@cloudera.com> wrote: > Tim, > > Do you also set your I/O formats explicitly to SequenceFileInputFormat > and SequenceFileOutputFormat? Via job.setInputFormat/setOutputFormat I > mean. > > Hadoop should not be splitting records across maps/mappers. There are > specific test cases that ensure this does not happen, so it would seem > strange if it does this. > > On Fri, Aug 19, 2011 at 6:01 PM, Tim Fletcher <zigomu...@gmail.com> wrote: > > Hi all, > > I am having issues using SequenceFileInputFormat to retrieve whole > records > > I have 1 job that is used to write to a SequenceFile > > SequenceFileOutputFormat.setOutputPath(job, new Path("out/data")); > > SequenceFileOutputFormat.setOutputCompressionType(job, > > SequenceFile.CompressionType.NONE); > > I then have a second job that is ment to read the file for processing > > SequenceFileInputFormat.addInputPath(job, new Path("out/data")); > > However, the values that i get as the arguments to the Map part of my job > > only seems to contain parts of the record. I am sure that i am missing > > something rather fundamental as to how Hadoop splits inputs to the > Mapper, > > but can't seem to find a way to stop the records being split. > > Any help (or a pointer to a specific page in the doc) would be greatly > > appreciated > > Regards, > > Tim > > > > -- > Harsh J >