[
https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064101#comment-13064101
]
Sylvain Lebresne commented on CASSANDRA-47:
-------------------------------------------
bq. As I wrote before - currently to check real size of the file (tested only
on linux because OS X FS saves empty blocks to the disk for some reason) you
need to get a block count using 'ls -alhs', current patch reserves an empty
space for each chunk because we need to do seeks while we write data using
SSTableWriter.
Yeah, I really think we shouldn't do that (i.e, have empty space between the
compressed chunks). I'm happy to learn that linux (or at least whatever file
system you are using, I haven't tried the patch on linux yet) is smart enough
to avoid allocating empty blocks but we shouldn't rely on this. I bet not all
file system do that (osx seems to prove that and I'm not sure all linux FS does
this) and anyway if you transfer the sstables or tar them or anything, it'll
still be more inefficient than necessary (because the file still *is* of the
size of the uncompressed data). We're also losing some space even on linux
depending on what the actual FS block size is (not a big deal, but this can add
up). So I think we really need to change the index (and key cache) to store the
offset in compressed data. Imho, the simplest way would be to instead of having
in the index the key followed by the offset, to have for compressed file, the
key, then the position of the chunk in the compressed file, then the offset in
the uncompressed chunk.
Another thing is that we will need that to be optional (if only because we
cannot expect people to trust this from day one). Don't get me wrong, it's nice
to have a first prototype to have an idea of what we're talking about, but I
just wanted to mention this because it's probably easier to take that into
account sooner than later (I also suspect we may be able to factor out some of
the code of BRAF and CDF, but I haven't look too closely so maybe not).
> SSTable compression
> -------------------
>
> Key: CASSANDRA-47
> URL: https://issues.apache.org/jira/browse/CASSANDRA-47
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Pavel Yaskevich
> Labels: compression
> Fix For: 1.0
>
> Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar
>
>
> We should be able to do SSTable compression which would trade CPU for I/O
> (almost always a good trade).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira