[
https://issues.apache.org/jira/browse/CASSANDRA-8019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joshua McKenzie updated CASSANDRA-8019:
---------------------------------------
Attachment: 8019_v3.txt
v3 attached. Refcounting on SSTR from within SSTableScanner, updated
SSTableRewriterTest to try-with-resource CompactionControllers and Scanners.
Passes all unit tests on linux and dtest failures match CI environment, and
"Unable to delete" errors on windows unit tests on 2.1 branch are greatly
reduced. I still see some "Unable to delete" messages during runtime while
attempting to force compaction on a loaded system but those are also reduced
and I'll track them down in a separate effort.
I chose to go with refcounting rather than simply changing the ordering in
CompactionTask as we need some codification of the ordering relationship
between scanners and sstables in order to prevent this type of "error" in the
future.
The SSTableScanner relies on internal data structures within the SSTR and,
while the previous code will hold the reference open and prevent GC due to the
pointer it has internally as well as the ifile and dfile references, our
previous logical structure of there being no relationship between
SSTableScanners being open and SSTR deletion was misleading. While we
replicate some of the references in the scanner so the SSTR can technically be
deleted out of order and we rely on the filesystem to keep the file open if we
have a handle to it, a more clear relationship between the components is
preferable IMO.
[~jbellis]: I threw you on this as reviewer when I was leaning towards log
suppression route as it was a trivial effort; [~krummas]: would you be willing
to review this as you've been in the compaction and SSTableRewriter space
recently?
> Windows Unit tests and Dtests erroring due to sstable deleting task error
> -------------------------------------------------------------------------
>
> Key: CASSANDRA-8019
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8019
> Project: Cassandra
> Issue Type: Bug
> Environment: Windows 7
> Reporter: Philip Thompson
> Assignee: Joshua McKenzie
> Labels: windows
> Fix For: 2.1.3
>
> Attachments: 8019_aggressive_v1.txt, 8019_conservative_v1.txt,
> 8019_v2.txt, 8019_v3.txt
>
>
> Currently a large number of dtests and unit tests are erroring on windows
> with the following error in the node log:
> {code}
> ERROR [NonPeriodicTasks:1] 2014-09-29 11:05:04,383
> SSTableDeletingTask.java:89 - Unable to delete
> c:\\users\\username\\appdata\\local\\temp\\dtest-vr6qgw\\test\\node1\\data\\system\\local-7ad54392bcdd35a684174e047860b377\\system-local-ka-4-Data.db
> (it will be removed on server restart; we'll also retry after GC)\n
> {code}
> git bisect points to the following commit:
> {code}
> 0e831007760bffced8687f51b99525b650d7e193 is the first bad commit
> commit 0e831007760bffced8687f51b99525b650d7e193
> Author: Benedict Elliott Smith <[email protected]>
> Date: Fri Sep 19 18:17:19 2014 +0100
> Fix resource leak in event of corrupt sstable
> patch by benedict; review by yukim for CASSANDRA-7932
> :100644 100644 d3ee7d99179dce03307503a8093eb47bd0161681
> f55e5d27c1c53db3485154cd16201fc5419f32df M CHANGES.txt
> :040000 040000 194f4c0569b6be9cc9e129c441433c5c14de7249
> 3c62b53b2b2bd4b212ab6005eab38f8a8e228923 M src
> :040000 040000 64f49266e328b9fdacc516c52ef1921fe42e994f
> de2ca38232bee6d2a6a5e068ed9ee0fbbc5aaebe M test
> {code}
> You can reproduce this by running simple_bootstrap_test.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)