Harsh,
Thanks for the input. Since my binary input file contains binary data records 
of fixed format and the file contains fixed number of binary records, wouldn't 
it be simpler to use FSDataInputStream to read binary data copied to HDFS as a 
byte array. I can simply copy a file containing HDFS paths to inputDir and a 
map job would be invoked on each HDFS file. ex:

     FSDataInputStream stm = fileSys.open(filename, 4096);
     byte[] actual = new byte[128];
    stm.read(actual, 0, actual.length);
    stm.see(4096);
    stm.close();

Let me know if this approach would work and if a potentially better approach 
exists. I am new to Hadoop so my question might seem too simplistic for some 
people.
Thanks,
Amritanshu

-----Original Message-----
From: Harsh J [mailto:ha...@cloudera.com] 
Sent: Tuesday, May 01, 2012 6:21 PM
To: hdfs-user@hadoop.apache.org
Cc: mlor...@uci.cu
Subject: Re: how to read binary data from hdfs

Amritanshu,

Implement your own custom InputFormat with a RecordReader and you can
read your files directly.

To learn how to implement custom readers/formats you can refer to an
example provided via sub-title "Processing a whole file as a record",
Page 206 | Chapter 7: MapReduce Types and Formats in Tom White's
Hadoop: The Definitive Guide, or you can read up the details on
http://developer.yahoo.com/hadoop/tutorial/module5.html#inputformat.

On Tue, May 1, 2012 at 3:42 PM, Amritanshu Shekhar
<amritanshu.shek...@exponential.com> wrote:
> Hi Guys,
> I want to read binary data (produced by a C program) that is copied to HDFS 
> using a java program. The idea is that I would write a map-reduce job 
> eventually  that would  use the aforementioned programs output(the java 
> program would read binary data and create a Java object which the map 
> function would use). I read about the sequence file format that hadoop 
> supports but converting the binary data using java serialization into 
> sequence file format would add another layer of complexity. Is there a simple 
> no frills API  that I can use to read binary data directly from HDFS. Any 
> help/resources would be deeply appreciated.
> Thanks and Regards,
> Amritanshu



-- 
Harsh J

Reply via email to