[
https://issues.apache.org/jira/browse/CASSANDRA-8019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165484#comment-14165484
]
Joshua McKenzie commented on CASSANDRA-8019:
--------------------------------------------
{quote}Compaction + drop assumes that if refcount is zero it's safe to
delete.{quote}
It does, however unless we can guarantee that all SSTableScanners are closed
with handles to the underlying files this is an incorrect assumption (on
Windows, pre 3.0)
{quote}How are we getting into a situation where SSTableScanner (used by
compaction) still has it open when it's deleted?{quote}
Previously (before CASSANDRA-7932) we used a CloseableIterator and closed both
that and the CompactionController prior to
DataTracker.markCompactedSSTablesReplaced. Currently we're managing the
controller and scanners via scoped-resource management within CompactionTask
and calling markCompactedSSTablesReplaced before either are closed out. This
marks the sstables obsolete, decrements ref count, and attempts to delete them
while we still have the index and data file explicitly open in the scanners.
Fixing the ordering in CompactionTask fixes the error this ticket was opened
for but doesn't address all instances of these types of errors in unit tests on
the 2.1 branch on Windows. I can play whac-a-mole tracking all of these down
but there's nothing stopping us from re-introducing further errors of this type
since there's no contract between the readers and scanners as far as references
to underlying files is concerned. On 2.1+linux or trunk+either, you'll never
see anything indicating that this ordering problem has occurred.
> Windows Unit tests and Dtests erroring due to sstable deleting task error
> -------------------------------------------------------------------------
>
> Key: CASSANDRA-8019
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8019
> Project: Cassandra
> Issue Type: Bug
> Environment: Windows 7
> Reporter: Philip Thompson
> Assignee: Joshua McKenzie
> Labels: windows
> Fix For: 2.1.1
>
> Attachments: 8019_aggressive_v1.txt, 8019_conservative_v1.txt,
> 8019_v2.txt
>
>
> Currently a large number of dtests and unit tests are erroring on windows
> with the following error in the node log:
> {code}
> ERROR [NonPeriodicTasks:1] 2014-09-29 11:05:04,383
> SSTableDeletingTask.java:89 - Unable to delete
> c:\\users\\username\\appdata\\local\\temp\\dtest-vr6qgw\\test\\node1\\data\\system\\local-7ad54392bcdd35a684174e047860b377\\system-local-ka-4-Data.db
> (it will be removed on server restart; we'll also retry after GC)\n
> {code}
> git bisect points to the following commit:
> {code}
> 0e831007760bffced8687f51b99525b650d7e193 is the first bad commit
> commit 0e831007760bffced8687f51b99525b650d7e193
> Author: Benedict Elliott Smith <[email protected]>
> Date: Fri Sep 19 18:17:19 2014 +0100
> Fix resource leak in event of corrupt sstable
> patch by benedict; review by yukim for CASSANDRA-7932
> :100644 100644 d3ee7d99179dce03307503a8093eb47bd0161681
> f55e5d27c1c53db3485154cd16201fc5419f32df M CHANGES.txt
> :040000 040000 194f4c0569b6be9cc9e129c441433c5c14de7249
> 3c62b53b2b2bd4b212ab6005eab38f8a8e228923 M src
> :040000 040000 64f49266e328b9fdacc516c52ef1921fe42e994f
> de2ca38232bee6d2a6a5e068ed9ee0fbbc5aaebe M test
> {code}
> You can reproduce this by running simple_bootstrap_test.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)