CommitLogSegment::seekAndWriteCommitLogHeader assumes header size doesn't 
change.
---------------------------------------------------------------------------------

                 Key: CASSANDRA-836
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-836
             Project: Cassandra
          Issue Type: Bug
          Components: Core
         Environment: n/a - all
            Reporter: Ross M
            Priority: Minor


CommitLogSegment::seekAndWriteCommitLogHeader assumes header size doesn't grow. 
there are pieces of the header (BitSet) that are serialized with java 
serialization which makes no such promises. 

the following code:

    /** writes header at the beginning of the file, then seeks back to current 
position */
    void seekAndWriteCommitLogHeader(byte[] bytes) throws IOException
    {
        long currentPos = logWriter.getFilePointer();
        logWriter.seek(0);

        writeCommitLogHeader(bytes);

        logWriter.seek(currentPos);
    }

works fine as long as the header size doesn't change, but if it grows the new 
header will over write the beginning of the data segment. the bit-set being 
written in the header happens to serialize to the same size, but there is no 
guarantee of this.

i found this when looking at optimizing the serialization of data to disk (thus 
improving write throughput/performance.) i removed the ObjectOutputStream 
serialization in BitSetSerializer and replaced it with a custom serialization 
that omits the generic java serialization/ObjectOutputStream stuff and just 
writes on the "true" bits. the custom serialization worked fine, but broke 
other parts of the code when the header bitset had new bits turned on, thus 
growing the header's size, data segment bytes were overwritten.

the serialized version of a BitSet can grow in a similar manner, no pomises of 
size/consistency are made, but with current use it luckily doesn't seem to 
happen.

a good fix is unclear. without forcing the header to be a fixed/constant size 
in some manner this problem could pop up at any point. it's generally not safe 
to rewrite headers like this without custom code that ensures the size doesn't 
change. one fix would be to manually write all of the header data out (rather 
than relying on java serialization and serialization code in other parts of 
cassandra not to change.) another might be to pad the size of the header so 
that the data inside can grow, but that seems fraught with (potential) 
problems. (i've played around with padding the header length, but that seems to 
cause other things to break, which i haven't been able to track down yet.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to