Hi Jiamin,
Thank you once again. Let me explain a bit on my scenario. I am using Amazon
Elastic MapReduce. So the output file is written to some folder inside S3.
I have only a single reduce task and inside that I do,
byte[] bytes = some-code-to-generate-bytes
output.collect(new Text("key"), new BytesWritable(bytes));
In the main method I have set the outputformat of the job configuration as
SequenceFileOutputFormat.
Now when I run this it creates a file in the given S3 output directory as
expected. I have a java client in my local machine which downloads this file
from S3 and tries to read it. The issue comes when reading this file,
because I am not sure how can I read this file to get the original set of
bytes I wrote from the reduce task. I looked into the
SequenceFileOutputFormat and it seems that this file contains a header and
body. So do I have to manually read it as bytes and extract out the portion
that I need or is there a built in API class to read such file?
Thank you
Saliya
On Wed, Mar 31, 2010 at 9:35 AM, welman Lu <[email protected]> wrote:
> Hi, Saliya,
>
> If you said the part files, I think you are talking about the results of
> the reduce function that stored inside the HDFS, right?
> If so, I think this example in "Hadoop The Definitive Guide" can help you.
> -------------
> Example 3-1. Displaying files from a Hadoop filesystem on standard output
> using a
> URLStreamHandler
> public class URLCat {
> static {
> URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
> }
>
> public static void main(String[] args) throws Exception {
> InputStream in = null;
> try {
> in = new URL(args[0]).openStream();
> IOUtils.copyBytes(in, System.out, 4096, false);
> } finally {
> IOUtils.closeStream(in);
> }
> }
> }
>
> Take a try, good luck!
>
>
> Best Regards
> Jiamin Lu
--
Saliya Ekanayake
http://www.esaliya.blogspot.com
http://www.esaliya.wordpress.com