[jira] [Commented] (CASSANDRA-4305) CF serialization failure when working with custom secondary indices.

Pavel Yaskevich (JIRA) Fri, 08 Jun 2012 04:28:29 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291708#comment-13291708
 ]


Pavel Yaskevich commented on CASSANDRA-4305:
--------------------------------------------

Ok, it is kind of pointless to argue about what can happen in the future but 
even from your examples it makes a lot of sense to guarantee RM integrity if we 
are to send it to yet another thread or require in CL, otherwise you very much 
risk persisting the corrupted data at some point (we don't have mechanism to 
reject modifications), because as the amount of processing in Table.apply grows 
it does so coherent with probability of unnoticed corruption e.g. when 
secondary index code would modify cf or columns by mistake racy with 
triggers/CL for example, which would lead to a very bad situation. Even if we 
are to somehow "optimize so that serialize the RM directly to the file (to 
avoid a copy)" we still need to convert it into writable form don't we? And 
thats were we would have to make hundred and five assertions just to notice 
that the calculated size matches the actual data size (like we do in 
FBUtilities.serialize()) because we would race with other components using the 
same mutation, e.g. we don't have a full control over indexing code anymore and 
even the corruption is not our mistake per se, we share a good part of guilt 
just because we let that happen due to the design decisions which in it's turn 
would make a negative impression overall.

bq. Furthermore, I have doubt that cloning the CF you're reusing before passing 
them to RM in your 2ndary index code will have a measurable impact on 
performance (though if you have numbers to show that it does make a noticeable 
difference, then it's a different discussion).

This is double standards, why do we try so hard not to make a one copy for 
serialization but instead require from secondary index to do a clone, of 
possibly, each CF and do that at the same stage of write path? I'm talking 
about cfs.indexManager.applyIndexUpdates() in Table.apply for example.
                
> CF serialization failure when working with custom secondary indices.
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-4305
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4305
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.10
>            Reporter: Pavel Yaskevich
>              Labels: datastax_qa
>         Attachments: CASSANDRA-4305.patch
>
>
> Assertion (below) was triggered when client was adding new rows to 
> Solr-backed secondary indices (1000-row batch without any timeout).
> {noformat}
> ERROR [COMMIT-LOG-WRITER] 2012-05-30 16:39:02,896 
> AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
> Thread[COMMIT-LOG-WRITER,5,main]
> java.lang.AssertionError: Final buffer length 176 to accomodate data size of 
> 123 (predicted 87) for RowMutation(keyspace='solrTest1338395932411', 
> key='6b6579383039', modifications=[ColumnFamily(cf1 
> [long:false:8@1338395942384024,stringId:false:13@1338395940586003,])])
>         at 
> org.apache.cassandra.utils.FBUtilities.serialize(FBUtilities.java:682)
>         at 
> org.apache.cassandra.db.RowMutation.getSerializedBuffer(RowMutation.java:279)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:122)
>         at 
> org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:600)
>         at 
> org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:49)
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.lang.Thread.run(Thread.java:662)
> {noformat}
> After investigation it was clear that it was happening because we were 
> holding instances of RowMutation queued to the addition to CommitLog to the 
> actual "write" moment which is redundant.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4305) CF serialization failure when working with custom secondary indices.

Reply via email to