RE: Whether the file is compressed before store it to hadoop?

Michael Harris Thu, 06 Dec 2007 16:55:45 -0800

I am not completely sure I understand this response. I was wondering the exact 
same thing as Ryan. So I followed what you said here and did the following :


FileSystem fs = DistributedFileSystem.get(conf);
            Path p = new Path("/user/hadoop/indexweek/" + startOfWeek.getTime()
                    + "_" + endOfWeek.getTime() + ".db");
            OutputStream dos = new GZIPOutputStream(fs.create(p));

Clearly the input is now compressed in the DFS, but how does Hadoop recognize 
that the input is compressed? For example when I browse to that file using the 
DFS web interface I get the compressed version. Will only map/reduce jobs see 
the uncompressed version or should it be uncompressed when viewed through the 
DFS web interface? Also how can Hadoop correctly decompress part of a file (a 
single block) when the file spans multiple blocks without decompressing the 
file as a whole?

Is automatic block compression on the roadmap for hadoop? It seems it would be 
very useful. In my case my compression ratio is 9:1 which seems like it would 
translate to significant space / io savings.

I haven’t had a chance to run it with Pig to see if map/reduce is working 
properly. Should I just expect it to work correctly or did I miss something?

Any clarification / correction would be appreciated.

Thanks,
Michael

-----Original Message-----
From: Stu Hood [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 26, 2007 6:16 PM
To: [email protected]
Subject: RE: Whether the file is compressed before store it to hadoop?

Hadoop will not automatically compress a file that you place into it.

If you compress a file before placing it in Hadoop, the compression package is 
used by MapReduce jobs to transparently decompress your GZipped files as input.

Thanks,
Stu

-----Original Message-----
From: Ryan <[EMAIL PROTECTED]>
Sent: Monday, November 26, 2007 8:16pm
To: [email protected]
Subject: Whether the file is compressed before store it to hadoop?

Hi,
　I'm new to the Hadoop, I'm confused by the store procedures, I found a zlib 
implementation in the package org.apache.hadoop.io.compression, So I wonder 
whether the file stored in Hadoop is compressed before it actually in Hadoop. 
that means Hadoop store the file is its compressed one. 
   
  sincerely
   Ryan

_________________________________________________________________
用 Live Search 搜尽天下资讯！
http://www.live.com/?searchOnly=true

RE: Whether the file is compressed before store it to hadoop?

Reply via email to