I am not completely sure I understand this response. I was wondering the exact
same thing as Ryan. So I followed what you said here and did the following :
FileSystem fs = DistributedFileSystem.get(conf);
Path p = new Path("/user/hadoop/indexweek/" + startOfWeek.getTime()
+ "_" + endOfWeek.getTime() + ".db");
OutputStream dos = new GZIPOutputStream(fs.create(p));
Clearly the input is now compressed in the DFS, but how does Hadoop recognize
that the input is compressed? For example when I browse to that file using the
DFS web interface I get the compressed version. Will only map/reduce jobs see
the uncompressed version or should it be uncompressed when viewed through the
DFS web interface? Also how can Hadoop correctly decompress part of a file (a
single block) when the file spans multiple blocks without decompressing the
file as a whole?
Is automatic block compression on the roadmap for hadoop? It seems it would be
very useful. In my case my compression ratio is 9:1 which seems like it would
translate to significant space / io savings.
I haven’t had a chance to run it with Pig to see if map/reduce is working
properly. Should I just expect it to work correctly or did I miss something?
Any clarification / correction would be appreciated.
Thanks,
Michael
-----Original Message-----
From: Stu Hood [mailto:[EMAIL PROTECTED]
Sent: Monday, November 26, 2007 6:16 PM
To: [email protected]
Subject: RE: Whether the file is compressed before store it to hadoop?
Hadoop will not automatically compress a file that you place into it.
If you compress a file before placing it in Hadoop, the compression package is
used by MapReduce jobs to transparently decompress your GZipped files as input.
Thanks,
Stu
-----Original Message-----
From: Ryan <[EMAIL PROTECTED]>
Sent: Monday, November 26, 2007 8:16pm
To: [email protected]
Subject: Whether the file is compressed before store it to hadoop?
Hi,
I'm new to the Hadoop, I'm confused by the store procedures, I found a zlib
implementation in the package org.apache.hadoop.io.compression, So I wonder
whether the file stored in Hadoop is compressed before it actually in Hadoop.
that means Hadoop store the file is its compressed one.
sincerely
Ryan
_________________________________________________________________
用 Live Search 搜尽天下资讯!
http://www.live.com/?searchOnly=true