Amritanshu, Implement your own custom InputFormat with a RecordReader and you can read your files directly.
To learn how to implement custom readers/formats you can refer to an example provided via sub-title "Processing a whole file as a record", Page 206 | Chapter 7: MapReduce Types and Formats in Tom White's Hadoop: The Definitive Guide, or you can read up the details on http://developer.yahoo.com/hadoop/tutorial/module5.html#inputformat. On Tue, May 1, 2012 at 3:42 PM, Amritanshu Shekhar <amritanshu.shek...@exponential.com> wrote: > Hi Guys, > I want to read binary data (produced by a C program) that is copied to HDFS > using a java program. The idea is that I would write a map-reduce job > eventually that would use the aforementioned programs output(the java > program would read binary data and create a Java object which the map > function would use). I read about the sequence file format that hadoop > supports but converting the binary data using java serialization into > sequence file format would add another layer of complexity. Is there a simple > no frills API that I can use to read binary data directly from HDFS. Any > help/resources would be deeply appreciated. > Thanks and Regards, > Amritanshu -- Harsh J