[ 
https://issues.apache.org/jira/browse/HDFS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355349#comment-14355349
 ] 

Hari Sekhon commented on HDFS-2115:
-----------------------------------

MapR-FS provides transparent compression at the filesystem level - it's a very 
good idea.

It could be done on a directory basis (like MapR) with specific subdirectory 
and file / file extension exclusions, such as a .ignore_compress file in the 
directory.

Keeping files in plain text format makes it easier to use different tools on 
them without worrying about codec or container format support etc, but 
currently one can pay an 8x storage penalty for keeping uncompressed text.

This would solve some real problems for us right now if we had it. It's also 
annoying that many tools are always showing reading textfiles but this is so 
costly on storage without this transparent compression. We actually are stuck 
with a large historical archive of compressed files we can't work with (no zip 
inputformat) and can't leave them uncompressed either because of the storage 
waste which would exceed our cluster capacity. Having to reprocess them all to 
convert to different compression and then hope all future tools can handle that 
format is far less ideal than just having transparent compression.

The increasing proliferation of tools and products on Hadoop exacerbates this 
issue as we can never be sure that the next tool will support format X. 
Everything supports text. Please add transparent compression to make working 
with text better.

Regards,

Hari Sekhon
http://www.linkedin.com/in/harisekhon

> Transparent compression in HDFS
> -------------------------------
>
>                 Key: HDFS-2115
>                 URL: https://issues.apache.org/jira/browse/HDFS-2115
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, hdfs-client
>            Reporter: Todd Lipcon
>
> In practice, we find that a lot of users store text data in HDFS without 
> using any compression codec. Improving usability of compressible formats like 
> Avro/RCFile helps with this, but we could also help many users by providing 
> an option to transparently compress data as it is stored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to