[ 
https://issues.apache.org/jira/browse/CASSANDRA-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839501#action_12839501
 ] 

Ross M commented on CASSANDRA-836:
----------------------------------

i'm not a moron. that was an example attached to the bug that quickly caused 
the problem to happen.

that said it actually is smaller than the java serialization of the BitSet {2, 
4} of length 5. with ints 5, 2, 2, 4 stored (128 bits) vs the 
 "java.util.BitSet" portion of java serialization which takes up 128 bits then 
you have at least 64-bits for the actual bit values plus anything else it 
sticks in there. there apparently are other things b/c the default version 
results in ~592 bits of data. even with all 5 bits turned on it would take 224 
with the int version.

> CommitLogSegment::seekAndWriteCommitLogHeader assumes header size doesn't 
> change.
> ---------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-836
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-836
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: n/a - all
>            Reporter: Ross M
>            Priority: Minor
>         Attachments: BitSetSerializer.java
>
>
> CommitLogSegment::seekAndWriteCommitLogHeader assumes header size doesn't 
> grow. there are pieces of the header (BitSet) that are serialized with java 
> serialization which makes no such promises. 
> the following code:
>     /** writes header at the beginning of the file, then seeks back to 
> current position */
>     void seekAndWriteCommitLogHeader(byte[] bytes) throws IOException
>     {
>         long currentPos = logWriter.getFilePointer();
>         logWriter.seek(0);
>         writeCommitLogHeader(bytes);
>         logWriter.seek(currentPos);
>     }
> works fine as long as the header size doesn't change, but if it grows the new 
> header will over write the beginning of the data segment. the bit-set being 
> written in the header happens to serialize to the same size, but there is no 
> guarantee of this.
> i found this when looking at optimizing the serialization of data to disk 
> (thus improving write throughput/performance.) i removed the 
> ObjectOutputStream serialization in BitSetSerializer and replaced it with a 
> custom serialization that omits the generic java 
> serialization/ObjectOutputStream stuff and just writes on the "true" bits. 
> the custom serialization worked fine, but broke other parts of the code when 
> the header bitset had new bits turned on, thus growing the header's size, 
> data segment bytes were overwritten.
> the serialized version of a BitSet can grow in a similar manner, no pomises 
> of size/consistency are made, but with current use it luckily doesn't seem to 
> happen.
> a good fix is unclear. without forcing the header to be a fixed/constant size 
> in some manner this problem could pop up at any point. it's generally not 
> safe to rewrite headers like this without custom code that ensures the size 
> doesn't change. one fix would be to manually write all of the header data out 
> (rather than relying on java serialization and serialization code in other 
> parts of cassandra not to change.) another might be to pad the size of the 
> header so that the data inside can grow, but that seems fraught with 
> (potential) problems. (i've played around with padding the header length, but 
> that seems to cause other things to break, which i haven't been able to track 
> down yet.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to