Hi Jiamin,

Thank you once again. Let me explain a bit on my scenario. I am using Amazon
Elastic MapReduce. So the output file is written to some folder inside S3.

I have only a single reduce task and inside that I do,

byte[] bytes = some-code-to-generate-bytes
output.collect(new Text("key"), new BytesWritable(bytes));

In the main method I have set the outputformat of the job configuration as
SequenceFileOutputFormat.


Now when I run this it creates a file in the given S3 output directory as
expected. I have a java client in my local machine which downloads this file
from S3 and tries to read it. The issue comes when reading this file,
because I am not sure how can I read this file to get the original set of
bytes I wrote from the reduce task. I looked into the
SequenceFileOutputFormat and it seems that this file contains a header and
body. So do I have to manually read it as bytes and extract out the portion
that I need or is there a built in API class to read such file?

Thank you
Saliya

On Wed, Mar 31, 2010 at 9:35 AM, welman Lu <[email protected]> wrote:

> Hi, Saliya,
>
> If you said the part files, I think you are talking about the results of
> the reduce function that stored inside the HDFS, right?
> If so, I think this example in "Hadoop The Definitive Guide" can help you.
> -------------
> Example 3-1. Displaying files from a Hadoop filesystem on standard output
> using a
> URLStreamHandler
> public class URLCat {
>   static {
>     URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
>   }
>
>   public static void main(String[] args) throws Exception {
>     InputStream in = null;
>     try {
>       in = new URL(args[0]).openStream();
>       IOUtils.copyBytes(in, System.out, 4096, false);
>     } finally {
>       IOUtils.closeStream(in);
>     }
>   }
> }
>
> Take a try, good luck!
>
>
> Best Regards
> Jiamin Lu




-- 
Saliya Ekanayake
http://www.esaliya.blogspot.com
http://www.esaliya.wordpress.com

Reply via email to