[
https://issues.apache.org/jira/browse/CASSANDRA-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839501#action_12839501
]
Ross M commented on CASSANDRA-836:
----------------------------------
i'm not a moron. that was an example attached to the bug that quickly caused
the problem to happen.
that said it actually is smaller than the java serialization of the BitSet {2,
4} of length 5. with ints 5, 2, 2, 4 stored (128 bits) vs the
"java.util.BitSet" portion of java serialization which takes up 128 bits then
you have at least 64-bits for the actual bit values plus anything else it
sticks in there. there apparently are other things b/c the default version
results in ~592 bits of data. even with all 5 bits turned on it would take 224
with the int version.
> CommitLogSegment::seekAndWriteCommitLogHeader assumes header size doesn't
> change.
> ---------------------------------------------------------------------------------
>
> Key: CASSANDRA-836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-836
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: n/a - all
> Reporter: Ross M
> Priority: Minor
> Attachments: BitSetSerializer.java
>
>
> CommitLogSegment::seekAndWriteCommitLogHeader assumes header size doesn't
> grow. there are pieces of the header (BitSet) that are serialized with java
> serialization which makes no such promises.
> the following code:
> /** writes header at the beginning of the file, then seeks back to
> current position */
> void seekAndWriteCommitLogHeader(byte[] bytes) throws IOException
> {
> long currentPos = logWriter.getFilePointer();
> logWriter.seek(0);
> writeCommitLogHeader(bytes);
> logWriter.seek(currentPos);
> }
> works fine as long as the header size doesn't change, but if it grows the new
> header will over write the beginning of the data segment. the bit-set being
> written in the header happens to serialize to the same size, but there is no
> guarantee of this.
> i found this when looking at optimizing the serialization of data to disk
> (thus improving write throughput/performance.) i removed the
> ObjectOutputStream serialization in BitSetSerializer and replaced it with a
> custom serialization that omits the generic java
> serialization/ObjectOutputStream stuff and just writes on the "true" bits.
> the custom serialization worked fine, but broke other parts of the code when
> the header bitset had new bits turned on, thus growing the header's size,
> data segment bytes were overwritten.
> the serialized version of a BitSet can grow in a similar manner, no pomises
> of size/consistency are made, but with current use it luckily doesn't seem to
> happen.
> a good fix is unclear. without forcing the header to be a fixed/constant size
> in some manner this problem could pop up at any point. it's generally not
> safe to rewrite headers like this without custom code that ensures the size
> doesn't change. one fix would be to manually write all of the header data out
> (rather than relying on java serialization and serialization code in other
> parts of cassandra not to change.) another might be to pad the size of the
> header so that the data inside can grow, but that seems fraught with
> (potential) problems. (i've played around with padding the header length, but
> that seems to cause other things to break, which i haven't been able to track
> down yet.)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.