[ 
https://issues.apache.org/jira/browse/HDFS-10183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Avgustinov updated HDFS-10183:
------------------------------------
    Attachment: HDFS-10183.2.patch

Adding v2 of the patch, which resolves the new checkstyle warnings by renaming 
the newly {{final}} fields to uppercase. I'm a bit mystified by the test 
failures, and for me locally the test suite doesn't pass with or without the 
patch. I'll try to diagnose, but if the answer is obvious to anyone I'd 
appreciate a heads-up.

[~sjlee0], good point about 
[JLS12.4.2|https://docs.oracle.com/javase/specs/jls/se8/html/jls-12.html#jls-12.4.2].
 It indeed suggests that something slightly more subtle is going on, as the 
situation I described in the original issue would not lead to a data race -- 
the acquisition of the class loading lock would order the write to the field in 
whichever thread wins the initialization race and the read in the other 
thread(s).

I suspect the answer is actually in the previous section 
([JLS12.4.1|https://docs.oracle.com/javase/specs/jls/se8/html/jls-12.html#jls-12.4.1]),
 which describes when class loading happens (i.e. when the procedure in 12.4.2 
applies). In particular, it says "A class or interface type T will be 
initialized immediately before the first occurrence of \[an instance creation 
or static member access\]". In other words, if the JVM can tell that a class 
has already been initialized, it does not need to perform the procedure of 
12.4.2.

With this in mind, a potential explanation for the problem is this: One thread 
initializes the class with the non-{{final}} {{ThreadLocal}} field, and another 
tries to use it shortly after, but without triggering its own initialization. 
Because it does not acquire the class loading lock, there is no happens-before 
relationship between its read of the static field and the write that happened 
during class initialization, and if we're unlucky the write has not in fact 
propagated.

I tried to come up with a small example demonstrating the issue, but couldn't 
trigger it. I speculate that it needs a certain critical mass of complexity in 
order to make the HotSpot compiler kick in and apply the necessary 
optimizations. The stack trace shown by [~busbey] in HADOOP-11969 is a clear 
example of it being a problem in practice, though -- in the state of the tree 
before that fix, the only write to the {{DECODER_FACTORY}} field is the field 
initializer (which sets it to non-{{null}}), and so the exception observed 
would be impossible without a data race along the lines described above.

> Prevent race condition during class initialization
> --------------------------------------------------
>
>                 Key: HDFS-10183
>                 URL: https://issues.apache.org/jira/browse/HDFS-10183
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.9.0
>            Reporter: Pavel Avgustinov
>            Assignee: Pavel Avgustinov
>            Priority: Minor
>             Fix For: 2.9.0
>
>         Attachments: HADOOP-12944.1.patch, HDFS-10183.2.patch
>
>
> In HADOOP-11969, [~busbey] tracked down a non-deterministic 
> {{NullPointerException}} to an oddity in the Java memory model: When multiple 
> threads trigger the loading of a class at the same time, one of them wins and 
> creates the {{java.lang.Class}} instance; the others block during this 
> initialization, but once it is complete they may obtain a reference to the 
> {{Class}} which has non-{{final}} fields still containing their default (i.e. 
> {{null}}) values. This leads to runtime failures that are hard to debug or 
> diagnose.
> HADOOP-11969 observed that {{ThreadLocal}} fields, by their very nature, are 
> very likely to be accessed from multiple threads, and thus the problem is 
> particularly severe there. Consequently, the patch removed all occurrences of 
> the issue in the code base.
> Unfortunately, since then HDFS-7964 has [reverted one of the fixes during a 
> refactoring|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-0c2e9f7f9e685f38d1a11373b627cfa6R151],
>  and introduced a [new instance of the 
> problem|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-6334d0df7d9aefbccd12b21bb7603169R43].
> The attached patch addresses the issue by adding the missing {{final}} 
> modifier in these two cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to