[ 
https://issues.apache.org/jira/browse/HADOOP-12829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157413#comment-15157413
 ] 

Sangjin Lee commented on HADOOP-12829:
--------------------------------------

I am +1 with the change proposed here. That said, I'd like to add a little more 
context to this.

I agree that as a rule one should always restore the interrupt upon catching 
the {{InterruptedException}} in order to make the thread interruptible. 
However, in this particular case the issue becomes bit academic. This thread is 
private to the {{FileSystem}} class, meaning that one cannot easily obtain a 
reference to this thread and interrupt it explicitly. This thread is also a 
daemon thread, and as such it will not hold up the process when the process is 
terminating. Those two facts combined make it acceptable to have an 
uninterruptible daemon thread. In a non-test scenario, interrupting this thread 
should *not* happen. So in that sense, I don't think this is a major issue one 
way or another (in that sense I would recommend lowering the priority of the 
issue to minor).

The most important goal here is to ensure that this thread does *NOT* terminate 
under any other condition (exceptions or errors) as that would mean a 
catastrophic consequence for memory, which we're still doing.


> StatisticsDataReferenceCleaner swallows interrupt exceptions
> ------------------------------------------------------------
>
>                 Key: HADOOP-12829
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12829
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 2.8.0, 2.7.3, 2.6.4
>            Reporter: Gregory Chanan
>            Assignee: Gregory Chanan
>         Attachments: HADOOP-12829.patch
>
>
> The StatisticsDataReferenceCleaner, implemented in HADOOP-12107 swallows 
> interrupt exceptions.  Over in Solr/Sentry land, we run thread leak checkers 
> on our test code, which passed before this change and fails after it.  Here's 
> a sample report:
> {code}
> 1 thread leaked from SUITE scope at 
> org.apache.solr.handler.TestSecureReplicationHandler: 
>    1) Thread[id=16, 
> name=org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner,
>  state=WAITING, group=TGRP-TestSecureReplicationHandler]
>         at java.lang.Object.wait(Native Method)
>         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
>         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
>         at 
> org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3040)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> And here's an indication that the interrupt is being ignored:
> {code}
> 25209 T16 oahf.FileSystem$Statistics$StatisticsDataReferenceCleaner.run WARN 
> exception in the cleaner thread but it will continue to run 
> java.lang.InterruptedException
>       at java.lang.Object.wait(Native Method)
>       at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
>       at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
>       at 
> org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3040)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> This is inconsistent with how other long-running threads in hadoop, i.e. 
> PeerCache respond to being interrupted.
> The argument for doing this in HADOOP-12107 is given as 
> (https://issues.apache.org/jira/browse/HADOOP-12107?focusedCommentId=14598397&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14598397):
> {quote}
> Cleaner#run
> Catch and log InterruptedException in the while loop, such that thread does 
> not die on a spurious wakeup. It's safe since it's a daemon thread.
> {quote}
> I'm unclear on what "spurious wakeup" means and it is not mentioned in 
> https://docs.oracle.com/javase/tutorial/essential/concurrency/interrupt.html:
> {quote}
> A thread sends an interrupt by invoking interrupt on the Thread object for 
> the thread to be interrupted. For the interrupt mechanism to work correctly, 
> the interrupted thread must support its own interruption.
> {quote}
> So, I believe this thread should respect interruption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to