[
https://issues.apache.org/jira/browse/CASSANDRA-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291708#comment-13291708
]
Pavel Yaskevich commented on CASSANDRA-4305:
--------------------------------------------
Ok, it is kind of pointless to argue about what can happen in the future but
even from your examples it makes a lot of sense to guarantee RM integrity if we
are to send it to yet another thread or require in CL, otherwise you very much
risk persisting the corrupted data at some point (we don't have mechanism to
reject modifications), because as the amount of processing in Table.apply grows
it does so coherent with probability of unnoticed corruption e.g. when
secondary index code would modify cf or columns by mistake racy with
triggers/CL for example, which would lead to a very bad situation. Even if we
are to somehow "optimize so that serialize the RM directly to the file (to
avoid a copy)" we still need to convert it into writable form don't we? And
thats were we would have to make hundred and five assertions just to notice
that the calculated size matches the actual data size (like we do in
FBUtilities.serialize()) because we would race with other components using the
same mutation, e.g. we don't have a full control over indexing code anymore and
even the corruption is not our mistake per se, we share a good part of guilt
just because we let that happen due to the design decisions which in it's turn
would make a negative impression overall.
bq. Furthermore, I have doubt that cloning the CF you're reusing before passing
them to RM in your 2ndary index code will have a measurable impact on
performance (though if you have numbers to show that it does make a noticeable
difference, then it's a different discussion).
This is double standards, why do we try so hard not to make a one copy for
serialization but instead require from secondary index to do a clone, of
possibly, each CF and do that at the same stage of write path? I'm talking
about cfs.indexManager.applyIndexUpdates() in Table.apply for example.
> CF serialization failure when working with custom secondary indices.
> --------------------------------------------------------------------
>
> Key: CASSANDRA-4305
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4305
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.10
> Reporter: Pavel Yaskevich
> Labels: datastax_qa
> Attachments: CASSANDRA-4305.patch
>
>
> Assertion (below) was triggered when client was adding new rows to
> Solr-backed secondary indices (1000-row batch without any timeout).
> {noformat}
> ERROR [COMMIT-LOG-WRITER] 2012-05-30 16:39:02,896
> AbstractCassandraDaemon.java (line 139) Fatal exception in thread
> Thread[COMMIT-LOG-WRITER,5,main]
> java.lang.AssertionError: Final buffer length 176 to accomodate data size of
> 123 (predicted 87) for RowMutation(keyspace='solrTest1338395932411',
> key='6b6579383039', modifications=[ColumnFamily(cf1
> [long:false:8@1338395942384024,stringId:false:13@1338395940586003,])])
> at
> org.apache.cassandra.utils.FBUtilities.serialize(FBUtilities.java:682)
> at
> org.apache.cassandra.db.RowMutation.getSerializedBuffer(RowMutation.java:279)
> at
> org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:122)
> at
> org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:600)
> at
> org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:49)
> at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> at java.lang.Thread.run(Thread.java:662)
> {noformat}
> After investigation it was clear that it was happening because we were
> holding instances of RowMutation queued to the addition to CommitLog to the
> actual "write" moment which is redundant.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira