[
https://issues.apache.org/jira/browse/CASSANDRA-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362042#comment-17362042
]
Stefania Alborghetti commented on CASSANDRA-12519:
--------------------------------------------------
{quote}
I'm not familiarized with the lifecycle package so I'm not sure whether
skipping the temporary sstables when resetting the levels is right, or whether
the validation error that happens after changing the metadata is caused by a
deeper problem.
{quote}
I would need to see the full reason why the transaction rejected a record and I
wasn't able to find a full failure, but it must have failed the checksum
verification because the metadata file is changed by the standalone tools,
{{sstablelevelreset}} in our case.
The transaction is checking if anything has tampered with a file guarded by it.
This is done by {{LogFile.verify()}} and would also prevent a main Cassandra
process from starting up. This is because there is some automated cleanup done
on startup when {{LogTransaction.removeUnfinishedLeftovers()}} is called. Since
we don't want to mistakenly delete files restored by users for example, we
check using a checksum which is calculated from the files that existed when the
transaction record was created. There are more checks but this is the main one
and the one that I believe must have failed.
So if anything changes any of these files, temporary or permanent, the
transaction detects it. These two standalone tools change the sstable metadata
and hence probably triggered it.
I think it's reasonable to change {{sstablelevelreset}} to skip temporary
files, because if the transaction did not complete, it's as if these files
never existed. However, I don't think this is sufficient to fix the problem,
because changing the old existing metadata files could also trigger a checksum
error. So I may be wrong, but it seems to me that the real fix is to use the
cleanup utility in the test, before running {{sstablelevelreset}} so that there
are no left over transactions.
If these two tools are likely to be used directly from users when the process
is offline, as they seem to be, I believe that they should cleanup leftover
transactions first, or at least issue a warning if there are any. Otherwise the
main process may refuse to start for the same reason explained above. To
cleanup leftovers we can simply call
{{LifecycleTransaction.removeUnfinishedLeftovers(cfs)}} from the tool itself,
before doing any work. We should consider a follow up to do this, or fix this
directly in this ticket. If we fix this here, then we don't need to do this in
the test.
So you can either merge what you have and open a follow up, or add
{{LifecycleTransaction.removeUnfinishedLeftovers(cfs)}}, as well as kipping the
temporary files (which seems more correct to me), and see if this fixes it
without changing the test.
> dtest failure in
> offline_tools_test.TestOfflineTools.sstableofflinerelevel_test
> -------------------------------------------------------------------------------
>
> Key: CASSANDRA-12519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12519
> Project: Cassandra
> Issue Type: Improvement
> Components: Test/dtest/python
> Reporter: Sean McCarthy
> Assignee: Andres de la Peña
> Priority: Normal
> Fix For: 4.0-rc2, 4.0, 3.0.x, 3.11.x, 4.0-rc, 4.x
>
> Attachments: node1.log, node1_debug.log, node1_gc.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_offheap_dtest/379/testReport/offline_tools_test/TestOfflineTools/sstableofflinerelevel_test/
> {code}
> Stacktrace
> File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
> File "/home/automaton/cassandra-dtest/offline_tools_test.py", line 209, in
> sstableofflinerelevel_test
> self.assertGreater(max(final_levels), 1)
> File "/usr/lib/python2.7/unittest/case.py", line 942, in assertGreater
> self.fail(self._formatMessage(msg, standardMsg))
> File "/usr/lib/python2.7/unittest/case.py", line 410, in fail
> raise self.failureException(msg)
> "1 not greater than 1
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]