[jira] [Commented] (CASSANDRA-47) SSTable compression

Sylvain Lebresne (JIRA) Wed, 27 Jul 2011 08:05:34 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071783#comment-13071783
 ]


Sylvain Lebresne commented on CASSANDRA-47:
-------------------------------------------

On the compression option, we can also have a compression flag and latter add a 
compression_options (as we do for the compaction_strategy). There is a good 
change we will end up with more than one option (there is at least the 
algorithm and chunk size that make sense, maybe Terje idea (though clearly out 
of the scope for this ticket). Some algorithms may also have tunables between 
compression and speed).
Of course, we could also do all of this with a single String (that we would 
parse server side), or with a mix of both solutions. Seems slightly simpler to 
just have a flag to turn compression on/off with good default and have options 
aside for those who care. Just saying, I don't really care so much.

bq. didn't change visibility of the sync and flush methods because this 
requires further consideration and should be done in the separate task

We *need* to do something for this ticket. Right now, if someone call the 
public flush or sync methods of CSW mistakenly, it will corrupt data. So we 
should at least have the public flush and sync throw an 
UnsupportedOperationException. I'm not saying it will be particularly clean, 
but I'll take "slightly ugly and safe" over "cleaner but dangerous" anytime.  
I'm fine with leaving a "cleaner" refactoring of this to a separate task though.

Other comments:
- resetAndTruncate is still a problem. What is saved in the mark now is the 
chunkOffset, that is the offset to the start of the current chunk, but this 
ignores everything that could be in the buffer at the time the mark is set.  So 
when you resetAndTruncate, you will not go back to the position you were when 
calling mark, but at the beginning of the chunk.
  The mark should save both the chunkOffset and the bufferOffset. Then 
resetAndTruncate must seek back to chunkOffset, uncompress the chunk in the 
buffer and set bufferOffset to what was saved.
  Nit: could be worth optimizing (in SW and CSW) for the case where the mark is 
in the current buffer. It's not unlikely to be the case given when we use that.
- In CSW, when resetBuffer is called, current is supposed to either be 
"aligned" on chunk boundary or we're closing the file. So it seems there is no 
need to realign bufferOffset, and thus no need to override resetBuffer.
- The truncateAndClose of CSW doesn't seem to truncate anything. It also 
doesn't honor skipIOCache correctly since it doesn't call the truncateAndClose 
of SW. But actually, I think that if the only backward seek we do is through 
resetAndTruncate, then there is no need to truncate on close (neither for SW 
nor CSW). So we should probably get rid of that function and move the relevant 
parts in close().
- Let's use readUTF/writeUTF to read/write the algorithm name in the metadata 
file. That's what we use for strings usually (and using a StringBuilder to read 
a string is a tad over the top).
- CompressionMetadata.readChunkOffsets() is buggy if the dataLength is a 
multiple of the chunckLength (we have one less chunk that what's computed then).
  Nit: a simple for(i = 0; i < nbChunks; ++i) loop would be cleaner (you can 
still capture the EOF and rethrow however).
- Maybe we should have a few unit tests for all this ? If we could make it so 
that the tests we already have for compaction and such are run with and without 
compression that would already be good.

Other Nits:
- No need to reset validBufferBytes in CSW.flushData(). It's done in 
resetBuffer (and not in SW.flushData(), so that'll improve symmetry).
- Chunk could be a static class in CompressionMetada I suppose.


> SSTable compression
> -------------------
>
>                 Key: CASSANDRA-47
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-47
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>              Labels: compression
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-47-v2.patch, CASSANDRA-47-v3-rebased.patch, 
> CASSANDRA-47-v3.patch, CASSANDRA-47-v4.patch, CASSANDRA-47.patch, 
> snappy-java-1.0.3-rc4.jar
>
>
> We should be able to do SSTable compression which would trade CPU for I/O 
> (almost always a good trade).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-47) SSTable compression

Reply via email to