[jira] [Commented] (CASSANDRA-47) SSTable compression

Sylvain Lebresne (JIRA) Tue, 12 Jul 2011 13:05:25 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064101#comment-13064101
 ]


Sylvain Lebresne commented on CASSANDRA-47:
-------------------------------------------

bq. As I wrote before - currently to check real size of the file (tested only 
on linux because OS X FS saves empty blocks to the disk for some reason) you 
need to get a block count using 'ls -alhs', current patch reserves an empty 
space for each chunk because we need to do seeks while we write data using 
SSTableWriter.

Yeah, I really think we shouldn't do that (i.e, have empty space between the 
compressed chunks). I'm happy to learn that linux (or at least whatever file 
system you are using, I haven't tried the patch on linux yet) is smart enough 
to avoid allocating empty blocks but we shouldn't rely on this. I bet not all 
file system do that (osx seems to prove that and I'm not sure all linux FS does 
this) and anyway if you transfer the sstables or tar them or anything, it'll 
still be more inefficient than necessary (because the file still *is* of the 
size of the uncompressed data). We're also losing some space even on linux 
depending on what the actual FS block size is (not a big deal, but this can add 
up). So I think we really need to change the index (and key cache) to store the 
offset in compressed data. Imho, the simplest way would be to instead of having 
in the index the key followed by the offset, to have for compressed file, the 
key, then the position of the chunk in the compressed file, then the offset in 
the uncompressed chunk.

Another thing is that we will need that to be optional (if only because we 
cannot expect people to trust this from day one). Don't get me wrong, it's nice 
to have a first prototype to have an idea of what we're talking about, but I 
just wanted to mention this because it's probably easier to take that into 
account sooner than later (I also suspect we may be able to factor out some of 
the code of BRAF and CDF, but I haven't look too closely so maybe not).  



> SSTable compression
> -------------------
>
>                 Key: CASSANDRA-47
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-47
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>              Labels: compression
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar
>
>
> We should be able to do SSTable compression which would trade CPU for I/O 
> (almost always a good trade).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-47) SSTable compression

Reply via email to