Hi, Is there any way I can get the size of each compressed (Gzip) block without actually compressing it. For example, I have 200mb uncompressed data in HDFS and the block size is 64 MB. I want to get the size of each of the 4 compressed blocks. The result might look like, the first block is 15 MB, second block is 20 MB, third one is 18MB and the fourth one is 2 MB.
I was thinking of using some command like hadoop fsck -blocks -files -locations to get each of the block files and run some kind of gzip -c FILENAME | wc -c to get the size of the compressed file. Please advise. Regards, Abhishek