[jira] [Commented] (CASSANDRA-10112) Refuse to start and print txn log information in case of disk corruption
[ https://issues.apache.org/jira/browse/CASSANDRA-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296532#comment-16296532 ] Stefania commented on CASSANDRA-10112: -- I cannot recall any specific reason, so I am guessing this was considered an improvement. > Refuse to start and print txn log information in case of disk corruption > > > Key: CASSANDRA-10112 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10112 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Stefania >Assignee: Stefania > Labels: doc-impacting > Fix For: 3.6 > > > Transaction logs were introduced by CASSANDRA-7066 and are read during > start-up. In case of file system errors, such as disk corruption, we > currently log a panic error and leave the sstable files and transaction logs > as they are; this is to avoid rolling back a transaction (i.e. deleting > files) by mistake. > We should instead look at the {{disk_failure_policy}} and refuse to start > unless the failure policy is {{ignore}}. > We should also consider stashing files that cannot be read during startup, > either transaction logs or sstables, by moving them to a dedicated > sub-folder. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10112) Refuse to start and print txn log information in case of disk corruption
[ https://issues.apache.org/jira/browse/CASSANDRA-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296464#comment-16296464 ] Marcus Eriksson commented on CASSANDRA-10112: - Does anyone remember why we didn't commit this to 3.0 as well? > Refuse to start and print txn log information in case of disk corruption > > > Key: CASSANDRA-10112 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10112 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Stefania >Assignee: Stefania > Labels: doc-impacting > Fix For: 3.6 > > > Transaction logs were introduced by CASSANDRA-7066 and are read during > start-up. In case of file system errors, such as disk corruption, we > currently log a panic error and leave the sstable files and transaction logs > as they are; this is to avoid rolling back a transaction (i.e. deleting > files) by mistake. > We should instead look at the {{disk_failure_policy}} and refuse to start > unless the failure policy is {{ignore}}. > We should also consider stashing files that cannot be read during startup, > either transaction logs or sstables, by moving them to a dedicated > sub-folder. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10112) Refuse to start and print txn log information in case of disk corruption
[ https://issues.apache.org/jira/browse/CASSANDRA-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182468#comment-15182468 ] Stefania commented on CASSANDRA-10112: -- Thank you for the review. bq. Can you verify that the failing {{org.apache.cassandra.io.sstable.SSTableWriterTest.testAbortTxnWithOpenEarlyShouldRemoveSSTable}} utest is not a regression? It passes locally and it failed on trunk as well, see [build 751|http://cassci.datastax.com/job/trunk_testall/751/testReport/org.apache.cassandra.io.sstable/SSTableWriterTest/testAbortTxnWithOpenEarlyShouldRemoveSSTable]. bq. It would be nice to use constants instead of magic numbers for {{StartupException}} exit status codes. I've introduced 3 generic constants (1: wrong machine state, 3: wrong disk state, 100: wrong config). I had to change the JNA unavailable exit error from 3 to 1. We could make the constants more specific but we'd have to change more exit codes. bq. In {{LogRecord.make()}}, why do we catch {{Throwable}}? Should we be passing that through {{JVMStabilityInspector}}? To catch the exceptions thrown by the {{valueOf()}} methods as far as I remember. I don't see anything else that could throw so I've replaced {{Throwable}} with {{IllegalArgumentException}}. bq. {{removeUnfinishedCompactionLeftovers()}} could use some javadocs (especially explaining the return value). Added some comments to {{LogTransaction.removeUnfinishedLeftovers()}}. bq. I have a slight for using the term "directories" instead of "folders" (but it's not worth changing existing code for this) You're quite right, folder is a Windows Explorer concept and it is not necessarily a directory. It didn't take long so I've changed the mentions to folder that I could find in the {{Log*}} files in {{db.lifecycle}}. bq. I think this ticket needs a {doc-impacting}} label Added. I've restarted one more CI run. > Refuse to start and print txn log information in case of disk corruption > > > Key: CASSANDRA-10112 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10112 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Stefania >Assignee: Stefania > Labels: doc-impacting > Fix For: 3.x > > > Transaction logs were introduced by CASSANDRA-7066 and are read during > start-up. In case of file system errors, such as disk corruption, we > currently log a panic error and leave the sstable files and transaction logs > as they are; this is to avoid rolling back a transaction (i.e. deleting > files) by mistake. > We should instead look at the {{disk_failure_policy}} and refuse to start > unless the failure policy is {{ignore}}. > We should also consider stashing files that cannot be read during startup, > either transaction logs or sstables, by moving them to a dedicated > sub-folder. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10112) Refuse to start and print txn log information in case of disk corruption
[ https://issues.apache.org/jira/browse/CASSANDRA-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178729#comment-15178729 ] Tyler Hobbs commented on CASSANDRA-10112: - Overall the patch looks good. Can you verify that the failing {{org.apache.cassandra.io.sstable.SSTableWriterTest.testAbortTxnWithOpenEarlyShouldRemoveSSTable}} utest is not a regression? Other than that, I just have a few nitpicks: * It would be nice to use constants instead of magic numbers for {{StartupException}} exit status codes. * In {{LogRecord.make()}}, why do we catch {{Throwable}}? Should we be passing that through {{JVMStabilityInspector}}? * {{removeUnfinishedCompactionLeftovers()}} could use some javadocs (especially explaining the return value). * I have a slight for using the term "directories" instead of "folders" (but it's not worth changing existing code for this) * I think this ticket needs a {{doc-impacting}} label > Refuse to start and print txn log information in case of disk corruption > > > Key: CASSANDRA-10112 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10112 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Stefania >Assignee: Stefania > Fix For: 3.x > > > Transaction logs were introduced by CASSANDRA-7066 and are read during > start-up. In case of file system errors, such as disk corruption, we > currently log a panic error and leave the sstable files and transaction logs > as they are; this is to avoid rolling back a transaction (i.e. deleting > files) by mistake. > We should instead look at the {{disk_failure_policy}} and refuse to start > unless the failure policy is {{ignore}}. > We should also consider stashing files that cannot be read during startup, > either transaction logs or sstables, by moving them to a dedicated > sub-folder. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10112) Refuse to start and print txn log information in case of disk corruption
[ https://issues.apache.org/jira/browse/CASSANDRA-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168663#comment-15168663 ] Stefania commented on CASSANDRA-10112: -- As suggested by [~benedict] above, this patch simply prints out all contents of transaction logs with problems and stops the start-up, regardless of disk failure policy. I've updated the title of this ticket accordingly. |[patch|https://github.com/stef1927/cassandra/commits/10112]| |[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10112-testall/]| |[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10112-dtest/]| Sample output: {code} ERROR 08:37:41 Unexpected disk state: failed to read transaction log [ma_txn_compaction_d291e2b0-dc62-11e5-8d7c-8933a8fd4210.log in folders /home/stefi/git/cstar/cassandra/bin/../data/data/keyspace1/standard1-a0509e70dc5f11e5a8dffbb9e667c513] Files and contents follow: /home/stefi/git/cstar/cassandra/bin/../data/data/keyspace1/standard1-a0509e70dc5f11e5a8dffbb9e667c513/ma_txn_compaction_d291e2b0-dc62-11e5-8d7c-8933a8fd4210.log ADD:[/home/stefi/git/cstar/cassandra/data/data/keyspace1/standard1-a0509e70dc5f11e5a8dffbb9e667c513/ma-4-big,0,8][4101796893] REMOVE:[/home/stefi/git/cstar/cassandra/data/data/keyspace1/standard1-a0509e70dc5f11e5a8dffbb9e667c513/ma-2-big,0,8][4101796893] ***Invalid checksum for sstable [ma-2-big]: [4101796893] should have been [2686116883] REMOVE:[/home/stefi/git/cstar/cassandra/data/data/keyspace1/standard1-a0509e70dc5f11e5a8dffbb9e667c513/ma-3-big,0,8][4101796893] ERROR 08:37:41 Cannot remove temporary or obsoleted files for keyspace1.standard1 due to a problem with transaction log files. Please check records with problems in the log messages above and fix them. Refer to the 3.0 upgrading instructions in NEWS.txt for a description of transaction log files. {code} > Refuse to start and print txn log information in case of disk corruption > > > Key: CASSANDRA-10112 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10112 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Stefania >Assignee: Stefania > Fix For: 3.x > > > Transaction logs were introduced by CASSANDRA-7066 and are read during > start-up. In case of file system errors, such as disk corruption, we > currently log a panic error and leave the sstable files and transaction logs > as they are; this is to avoid rolling back a transaction (i.e. deleting > files) by mistake. > We should instead look at the {{disk_failure_policy}} and refuse to start > unless the failure policy is {{ignore}}. > We should also consider stashing files that cannot be read during startup, > either transaction logs or sstables, by moving them to a dedicated > sub-folder. -- This message was sent by Atlassian JIRA (v6.3.4#6332)