[jira] [Commented] (CASSANDRA-10109) Windows dtest 3.0: ttl_test.py failures

Stefania (JIRA) Thu, 27 Aug 2015 02:22:58 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716344#comment-14716344
 ]


Stefania commented on CASSANDRA-10109:
--------------------------------------

Do we still retry a number of times if we cannot read txn logs? I guess not 
since this should only happen in case of manual intervention or a bug? So the 
MAX_ATTEMPTS loop is gone totally right?

bq. {{If the commit/abort record is present, just apply that}}

Here you mean just return the final and temporary files that we have found in 
the first sweep right, basically what we are doing at the moment? 

bq. {{--if the txn log is missing, we can safely do nothing}}

Do we still return tmp files from the first run anyway?

bq. {{-- if the last record is now present, apply the logic}}

Here you mean the COMMIT/ABORT record?

bq. {{--if none of these hold we must have a bug, so throw an exception}}

I'm not totally convinced this is true. An old sstable could be fully released 
long before the transaction is completed, there is no requirement it should be 
released afterwards or at the same time. The comment of 
{{TransactionLog.obsoleted}} seems wrong.

bq. On startup, however, our current logic is fine. But we don't need to retry; 
we should just fail if we encounter an unrecoverable exception. This should 
simplify things.

By startup you mean anything until we complete setup in cassandra daemon? Are 
we positive we will never use transactions during this phase, i.e. some 
asynchronous tasks being launched and racing? I must say that I don't see 
having two different ways of listing files as simplifying things. We scrub the 
data directories pretty early, so any left over transactions won't be present 
during startup and the listing will simply not find any txn logs and just 
return the data files we found.

> Windows dtest 3.0: ttl_test.py failures
> ---------------------------------------
>
>                 Key: CASSANDRA-10109
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10109
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Joshua McKenzie
>            Assignee: Stefania
>              Labels: Windows
>             Fix For: 3.0.0 rc1
>
>
> ttl_test.py:TestTTL.update_column_ttl_with_default_ttl_test2
> ttl_test.py:TestTTL.update_multiple_columns_ttl_test
> ttl_test.py:TestTTL.update_single_column_ttl_test
> Errors locally are different than CI from yesterday. Yesterday on CI we have 
> timeouts and general node hangs. Today on all 3 tests when run locally I see:
> {noformat}
> Traceback (most recent call last):
>   File "c:\src\cassandra-dtest\dtest.py", line 532, in tearDown
>     raise AssertionError('Unexpected error in %s node log: %s' % (node.name, 
> errors))
> AssertionError: Unexpected error in node1 node log: ['ERROR [main] 2015-08-17 
> 16:53:43,120 NoSpamLogger.java:97 - This platform does not support atomic 
> directory streams (SecureDirectoryStream); race conditions when loading 
> sstable files could occurr']
> {noformat}
> This traces back to the commit for CASSANDRA-7066 today by [~Stefania] and 
> [~benedict].  Stefania - care to take this ticket and also look further into 
> whether or not we're going to have issues with 7066 on Windows? That error 
> message certainly *sounds* like it's not a good thing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10109) Windows dtest 3.0: ttl_test.py failures

Reply via email to