CommitLogSegment::seekAndWriteCommitLogHeader assumes header size doesn't
change.
---------------------------------------------------------------------------------
Key: CASSANDRA-836
URL: https://issues.apache.org/jira/browse/CASSANDRA-836
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: n/a - all
Reporter: Ross M
Priority: Minor
CommitLogSegment::seekAndWriteCommitLogHeader assumes header size doesn't grow.
there are pieces of the header (BitSet) that are serialized with java
serialization which makes no such promises.
the following code:
/** writes header at the beginning of the file, then seeks back to current
position */
void seekAndWriteCommitLogHeader(byte[] bytes) throws IOException
{
long currentPos = logWriter.getFilePointer();
logWriter.seek(0);
writeCommitLogHeader(bytes);
logWriter.seek(currentPos);
}
works fine as long as the header size doesn't change, but if it grows the new
header will over write the beginning of the data segment. the bit-set being
written in the header happens to serialize to the same size, but there is no
guarantee of this.
i found this when looking at optimizing the serialization of data to disk (thus
improving write throughput/performance.) i removed the ObjectOutputStream
serialization in BitSetSerializer and replaced it with a custom serialization
that omits the generic java serialization/ObjectOutputStream stuff and just
writes on the "true" bits. the custom serialization worked fine, but broke
other parts of the code when the header bitset had new bits turned on, thus
growing the header's size, data segment bytes were overwritten.
the serialized version of a BitSet can grow in a similar manner, no pomises of
size/consistency are made, but with current use it luckily doesn't seem to
happen.
a good fix is unclear. without forcing the header to be a fixed/constant size
in some manner this problem could pop up at any point. it's generally not safe
to rewrite headers like this without custom code that ensures the size doesn't
change. one fix would be to manually write all of the header data out (rather
than relying on java serialization and serialization code in other parts of
cassandra not to change.) another might be to pad the size of the header so
that the data inside can grow, but that seems fraught with (potential)
problems. (i've played around with padding the header length, but that seems to
cause other things to break, which i haven't been able to track down yet.)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.