Hi,

Is there any way I can get the size of each compressed (Gzip) block without
actually compressing it. For example, I have 200mb uncompressed data in
HDFS and the block size is 64 MB. I want to get the size of each of the 4
compressed blocks. The result might look like, the first block is 15 MB,
second block is 20 MB, third one is 18MB and the fourth one is 2 MB.

I was thinking of using some command like hadoop fsck -blocks -files
-locations to get each of the block files and run some kind of gzip -c
FILENAME | wc -c to get the size of the compressed file.

Please advise.

Regards,
Abhishek

Reply via email to