Thanks for your email Robert! IMHO compression has other effects (pegging CPUs, needing more memory) . If you enable compression on all blocks, you can't provide uncompressed performance (its arguable whether compression will always be faster / slower). Regardless, users are free to compress at the application level, so if they expect to gain by enabling compression, they are free to do so. If compression, would harm my application, there'd be no way to disable it. i.e. This would be a one-way street. Please correct me if I misunderstand your proposal.
Also, HDFS's usecase is rapidly shifting from long streaming reads to a more diverse workload. I've seen some JIRAs with common themes though : https://issues.apache.org/jira/browse/HADOOP-13114 . https://issues.apache.org/jira/browse/HADOOP-13340 . I am wondering if users of Hadoop are seeing some problems that may make more sense to deal with at a system level.. Thanks Ravi. On Mon, Jul 4, 2016 at 7:16 AM, Robert James <srobertja...@gmail.com> wrote: > A lot of work in Hadoop concerns splittable compression. Could this > be solved by offerring compression at the HDFS block (ie 64 MB) level, > just like many OS filesystems do? > > > http://stackoverflow.com/questions/6511255/why-cant-hadoop-split-up-a-large-text-file-and-then-compress-the-splits-using-g?rq=1 > discusses this and suggests the issues is separation of concerns. > However, if the compression is done at the *HDFS block* level (with > perhaps a single flag indicating such), this would be totally > transparent to readers and writers. This is the exact way, for > example, NTFS compression works; apps need no knowledge of the > compression. HDFS, since it doesn't allow random reads and writes, > but only streaming, is a perfect candidate for this. > > Thoughts? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > >