[
https://issues.apache.org/jira/browse/CASSANDRA-8019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joshua McKenzie updated CASSANDRA-8019:
---------------------------------------
Attachment: 8019_v2.txt
After chewing on this a bit, I've come to the conclusion that the problem here
isn't really the order of deletion or even the pre-3.0 behavior as those files
are *eventually* successfully deleted on a subsequent GC. Our problem is that
we're logging this as an error immediately on 1st failure on Windows when we
expect there to be some contention on ordering pre-CASSANDRA-4050 and it's not
really an error condition.
Having said that, we want to still log on legitimate error conditions so
suppressing or dropping to WARN wouldn't be appropriate in those cases.
I've attached a v2 patch against 2.0 that adds a retryCount to our
SSTableDeletingTask that will print the error message after 3 failed deletion
attempts and reset the counter, only if on Windows. Behavior on Linux remains
at 1 failed deletion == logged. v2 quiets all deletion errors in unit tests on
2.0 and 2.1 but should leave room for genuine locked / undeletable files to log
after a few failures. I should note: 3 is a completely arbitrary number, and
relying on GC for eventual file deletion is of course not ideal.
Thoughts [~jbellis]? I'd prefer we nip this in the bud as this 'Unable to
delete' error is getting more prevalent on the 2.1 branch as we make further
changes and optimizations, and I'm more comfortable loosening up the logging
criteria for this error than retrofitting more reference counting or making
changes to scanner close ordering throughout the code-base.
> Windows Unit tests and Dtests erroring due to sstable deleting task error
> -------------------------------------------------------------------------
>
> Key: CASSANDRA-8019
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8019
> Project: Cassandra
> Issue Type: Bug
> Environment: Windows 7
> Reporter: Philip Thompson
> Assignee: Joshua McKenzie
> Labels: windows
> Fix For: 2.1.1
>
> Attachments: 8019_aggressive_v1.txt, 8019_conservative_v1.txt,
> 8019_v2.txt
>
>
> Currently a large number of dtests and unit tests are erroring on windows
> with the following error in the node log:
> {code}
> ERROR [NonPeriodicTasks:1] 2014-09-29 11:05:04,383
> SSTableDeletingTask.java:89 - Unable to delete
> c:\\users\\username\\appdata\\local\\temp\\dtest-vr6qgw\\test\\node1\\data\\system\\local-7ad54392bcdd35a684174e047860b377\\system-local-ka-4-Data.db
> (it will be removed on server restart; we'll also retry after GC)\n
> {code}
> git bisect points to the following commit:
> {code}
> 0e831007760bffced8687f51b99525b650d7e193 is the first bad commit
> commit 0e831007760bffced8687f51b99525b650d7e193
> Author: Benedict Elliott Smith <[email protected]>
> Date: Fri Sep 19 18:17:19 2014 +0100
> Fix resource leak in event of corrupt sstable
> patch by benedict; review by yukim for CASSANDRA-7932
> :100644 100644 d3ee7d99179dce03307503a8093eb47bd0161681
> f55e5d27c1c53db3485154cd16201fc5419f32df M CHANGES.txt
> :040000 040000 194f4c0569b6be9cc9e129c441433c5c14de7249
> 3c62b53b2b2bd4b212ab6005eab38f8a8e228923 M src
> :040000 040000 64f49266e328b9fdacc516c52ef1921fe42e994f
> de2ca38232bee6d2a6a5e068ed9ee0fbbc5aaebe M test
> {code}
> You can reproduce this by running simple_bootstrap_test.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)