[jira] [Commented] (CASSANDRA-10112) Refuse to start and print txn log information in case of disk corruption

2017-12-19 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296532#comment-16296532
 ] 

Stefania commented on CASSANDRA-10112:
--

I cannot recall any specific reason, so I am guessing this was considered an 
improvement.

> Refuse to start and print txn log information in case of disk corruption
> 
>
> Key: CASSANDRA-10112
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10112
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Stefania
>Assignee: Stefania
>  Labels: doc-impacting
> Fix For: 3.6
>
>
> Transaction logs were introduced by CASSANDRA-7066 and are read during 
> start-up. In case of file system errors, such as disk corruption, we 
> currently log a panic error and leave the sstable files and transaction logs 
> as they are; this is to avoid rolling back a transaction (i.e. deleting 
> files) by mistake.
> We should instead look at the {{disk_failure_policy}} and refuse to start 
> unless the failure policy is {{ignore}}. 
> We should also consider stashing files that cannot be read during startup, 
> either transaction logs or sstables, by moving them to a dedicated 
> sub-folder. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10112) Refuse to start and print txn log information in case of disk corruption

2017-12-19 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296464#comment-16296464
 ] 

Marcus Eriksson commented on CASSANDRA-10112:
-

Does anyone remember why we didn't commit this to 3.0 as well?

> Refuse to start and print txn log information in case of disk corruption
> 
>
> Key: CASSANDRA-10112
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10112
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Stefania
>Assignee: Stefania
>  Labels: doc-impacting
> Fix For: 3.6
>
>
> Transaction logs were introduced by CASSANDRA-7066 and are read during 
> start-up. In case of file system errors, such as disk corruption, we 
> currently log a panic error and leave the sstable files and transaction logs 
> as they are; this is to avoid rolling back a transaction (i.e. deleting 
> files) by mistake.
> We should instead look at the {{disk_failure_policy}} and refuse to start 
> unless the failure policy is {{ignore}}. 
> We should also consider stashing files that cannot be read during startup, 
> either transaction logs or sstables, by moving them to a dedicated 
> sub-folder. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10112) Refuse to start and print txn log information in case of disk corruption

2016-03-06 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182468#comment-15182468
 ] 

Stefania commented on CASSANDRA-10112:
--

Thank you for the review.

bq. Can you verify that the failing 
{{org.apache.cassandra.io.sstable.SSTableWriterTest.testAbortTxnWithOpenEarlyShouldRemoveSSTable}}
 utest is not a regression?

It passes locally and it failed on trunk as well, see [build 
751|http://cassci.datastax.com/job/trunk_testall/751/testReport/org.apache.cassandra.io.sstable/SSTableWriterTest/testAbortTxnWithOpenEarlyShouldRemoveSSTable].

bq. It would be nice to use constants instead of magic numbers for 
{{StartupException}} exit status codes.

I've introduced 3 generic constants (1: wrong machine state, 3: wrong disk 
state, 100: wrong config). I had to change the JNA unavailable exit error from 
3 to 1. We could make the constants more specific but we'd have to change more 
exit codes.

bq. In {{LogRecord.make()}}, why do we catch {{Throwable}}? Should we be 
passing that through {{JVMStabilityInspector}}?

To catch the exceptions thrown by the {{valueOf()}} methods as far as I 
remember. I don't see anything else that could throw so I've replaced 
{{Throwable}} with {{IllegalArgumentException}}.

bq. {{removeUnfinishedCompactionLeftovers()}} could use some javadocs 
(especially explaining the return value).

Added some comments to {{LogTransaction.removeUnfinishedLeftovers()}}.

bq. I have a slight for using the term "directories" instead of "folders" (but 
it's not worth changing existing code for this)

You're quite right, folder is a Windows Explorer concept and it is not 
necessarily a directory. It didn't take long so I've changed the mentions to 
folder that I could find in the {{Log*}} files in {{db.lifecycle}}. 

bq. I think this ticket needs a {doc-impacting}} label

Added.

I've restarted one more CI run.

> Refuse to start and print txn log information in case of disk corruption
> 
>
> Key: CASSANDRA-10112
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10112
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Stefania
>Assignee: Stefania
>  Labels: doc-impacting
> Fix For: 3.x
>
>
> Transaction logs were introduced by CASSANDRA-7066 and are read during 
> start-up. In case of file system errors, such as disk corruption, we 
> currently log a panic error and leave the sstable files and transaction logs 
> as they are; this is to avoid rolling back a transaction (i.e. deleting 
> files) by mistake.
> We should instead look at the {{disk_failure_policy}} and refuse to start 
> unless the failure policy is {{ignore}}. 
> We should also consider stashing files that cannot be read during startup, 
> either transaction logs or sstables, by moving them to a dedicated 
> sub-folder. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10112) Refuse to start and print txn log information in case of disk corruption

2016-03-03 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178729#comment-15178729
 ] 

Tyler Hobbs commented on CASSANDRA-10112:
-

Overall the patch looks good.

Can you verify that the failing 
{{org.apache.cassandra.io.sstable.SSTableWriterTest.testAbortTxnWithOpenEarlyShouldRemoveSSTable}}
 utest is not a regression?

Other than that, I just have a few nitpicks:
* It would be nice to use constants instead of magic numbers for 
{{StartupException}} exit status codes.
* In {{LogRecord.make()}}, why do we catch {{Throwable}}?  Should we be passing 
that through {{JVMStabilityInspector}}?
* {{removeUnfinishedCompactionLeftovers()}} could use some javadocs (especially 
explaining the return value).
* I have a slight for using the term "directories" instead of "folders" (but 
it's not worth changing existing code for this)
* I think this ticket needs a {{doc-impacting}} label


> Refuse to start and print txn log information in case of disk corruption
> 
>
> Key: CASSANDRA-10112
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10112
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Stefania
>Assignee: Stefania
> Fix For: 3.x
>
>
> Transaction logs were introduced by CASSANDRA-7066 and are read during 
> start-up. In case of file system errors, such as disk corruption, we 
> currently log a panic error and leave the sstable files and transaction logs 
> as they are; this is to avoid rolling back a transaction (i.e. deleting 
> files) by mistake.
> We should instead look at the {{disk_failure_policy}} and refuse to start 
> unless the failure policy is {{ignore}}. 
> We should also consider stashing files that cannot be read during startup, 
> either transaction logs or sstables, by moving them to a dedicated 
> sub-folder. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10112) Refuse to start and print txn log information in case of disk corruption

2016-02-26 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168663#comment-15168663
 ] 

Stefania commented on CASSANDRA-10112:
--

As suggested by [~benedict] above, this patch simply prints out all contents of 
transaction logs with problems and stops the start-up, regardless of disk 
failure policy. I've updated the title of this ticket accordingly.

|[patch|https://github.com/stef1927/cassandra/commits/10112]|
|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10112-testall/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10112-dtest/]|

Sample output:

{code}
ERROR 08:37:41 Unexpected disk state: failed to read transaction log 
[ma_txn_compaction_d291e2b0-dc62-11e5-8d7c-8933a8fd4210.log in folders 
/home/stefi/git/cstar/cassandra/bin/../data/data/keyspace1/standard1-a0509e70dc5f11e5a8dffbb9e667c513]
Files and contents follow:
/home/stefi/git/cstar/cassandra/bin/../data/data/keyspace1/standard1-a0509e70dc5f11e5a8dffbb9e667c513/ma_txn_compaction_d291e2b0-dc62-11e5-8d7c-8933a8fd4210.log

ADD:[/home/stefi/git/cstar/cassandra/data/data/keyspace1/standard1-a0509e70dc5f11e5a8dffbb9e667c513/ma-4-big,0,8][4101796893]

REMOVE:[/home/stefi/git/cstar/cassandra/data/data/keyspace1/standard1-a0509e70dc5f11e5a8dffbb9e667c513/ma-2-big,0,8][4101796893]
***Invalid checksum for sstable [ma-2-big]: [4101796893] should 
have been [2686116883]

REMOVE:[/home/stefi/git/cstar/cassandra/data/data/keyspace1/standard1-a0509e70dc5f11e5a8dffbb9e667c513/ma-3-big,0,8][4101796893]

ERROR 08:37:41 Cannot remove temporary or obsoleted files for 
keyspace1.standard1 due to a problem with transaction log files. Please check 
records with problems in the log messages above and fix them. Refer to the 3.0 
upgrading instructions in NEWS.txt for a description of transaction log files.
{code}

> Refuse to start and print txn log information in case of disk corruption
> 
>
> Key: CASSANDRA-10112
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10112
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Stefania
>Assignee: Stefania
> Fix For: 3.x
>
>
> Transaction logs were introduced by CASSANDRA-7066 and are read during 
> start-up. In case of file system errors, such as disk corruption, we 
> currently log a panic error and leave the sstable files and transaction logs 
> as they are; this is to avoid rolling back a transaction (i.e. deleting 
> files) by mistake.
> We should instead look at the {{disk_failure_policy}} and refuse to start 
> unless the failure policy is {{ignore}}. 
> We should also consider stashing files that cannot be read during startup, 
> either transaction logs or sstables, by moving them to a dedicated 
> sub-folder. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)