[
https://issues.apache.org/jira/browse/HDFS-10267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229424#comment-15229424
]
Colin Patrick McCabe commented on HDFS-10267:
---------------------------------------------
Basically the test, at the high level, is something like this:
1. create the {{slowWriterThread}} thread and make it the {{Writer}} for
{{recoveringBlock}}, by calling {{FsDatasetImpl#createRbw}}. Basically
{{FsDatasetImpl}} grabs the {{Thread}} object and stores it in
{{ReplicaInPipeline}}.
2. create the {{stopWriterThread}} thread and have it call some operation that
will call {{ReplicaInPipe#stopWriter}} on {{recovingBlock}}. This sends an INE
(InterruptedException) to {{SlowWriterThread}}
3. {{slowWriterThread}} receives the {{InterruptedException}}, and sets an
{{AtomicBoolean}}. But it doesn't exit, meaning that {{stopWriterThread}} will
hang.
4. meanwhile, the main thread waits to see the AtomicBoolean set by step #3
5. the main thread calls some operation on {{FsDatasetImpl}} that needs to take
the lock. If {{stopWriterThread}} failed to drop the lock when calling
{{stopWriter}}, the test will deadlock here and we will get our timeout.
Otherwise, the test succeeds.
6. main thread tells {{slowWriterThread}} to exit, then joins all threads.
main thread ensures that no thread exited in a dirty way
> Extra "synchronized" on FsDatasetImpl#recoverAppend and
> FsDatasetImpl#recoverClose
> ----------------------------------------------------------------------------------
>
> Key: HDFS-10267
> URL: https://issues.apache.org/jira/browse/HDFS-10267
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.8.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-10267.001.patch, HDFS-10267.002.patch,
> HDFS-10267.003.patch, HDFS-10267.004.patch
>
>
> There is an extra "synchronized" on FsDatasetImpl#recoverAppend and
> FsDatasetImpl#recoverClose that prevents the HDFS-8496 fix from working as
> intended. This should be removed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)