[ 
https://issues.apache.org/jira/browse/HADOOP-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285299#comment-14285299
 ] 

Tsuyoshi OZAWA commented on HADOOP-11209:
-----------------------------------------

[~varun_saxena], 

Thanks for your updating! I spent some times to reproduce the problem by a 
test. Please use following test case instead of yours to reproduce a problem 
reported in SPARK-2546. The problems is not to cause 
ConcurrentModificationException but to go into unexpected busy loop because of 
resizing of HashMap. I confirmed the problem will be reproduced with the patch 
locally.

{code}
  /**
   * A test to check whether this thread goes into infinite loop because of a
   * destroy of data structure by resize of Map. This problem was reported as
   * SPARK-2546.
   * @throws Exception
   */
  public void testConcurrentAccesses() throws Exception {
    out = new BufferedWriter(new FileWriter(CONFIG));
    startConfig();
    declareProperty("some.config", "xyz", "xyz", false);
    endConfig();
    Path fileResource = new Path(CONFIG);
    Configuration conf = new Configuration();
    conf.addResource(fileResource);

    class ConfigModifyThread extends Thread {
      final private Configuration config;
      final private String prefix;

      public ConfigModifyThread(Configuration conf, String prefix) {
        config = conf;
        this.prefix = prefix;
      }

      @Override
      public void run() {
        for (int i = 0; i < 100000; i++) {
          config.set("some.config.value-" + prefix + i, "value");
        }
      }
    }

    ArrayList<ConfigModifyThread> threads = new ArrayList<>();
    for (int i = 0; i < 100; i++) {
      threads.add(new ConfigModifyThread(conf, String.valueOf(i)));
    }
    for (Thread t: threads) {
      t.start();
    }
    for (Thread t: threads) {
      t.join();
    }
    // If this test without going infinite loop, it's expected behaviour.
  }
{code}

Followings are comments about the Configuration.java:

1. Why not use Collections.synchronizedSet(new HashSet<String>()) 
straightforwardly?

{code}
+  private Set<String> finalParameters = Collections.newSetFromMap(
+      new ConcurrentHashMap<String, Boolean>());

+     this.finalParameters = Collections.newSetFromMap(
+         new ConcurrentHashMap<String, Boolean>());
{code}

2. The definition of updatingResource and backup should be Map<String, 
String[]> instead of ConcurrentHashMap.

{code}
+  private ConcurrentHashMap<String, String[]> updatingResource;
{code}
{code}
+      ConcurrentHashMap<String, String[]> backup = 
{code}

3. The indents of following lines are strange because of tab. Please replace 
them with 2 spaces.
{code}
-         value = DEFAULT_STRING_CHECK;
-       }
+             value = DEFAULT_STRING_CHECK;
+           }
{code}

4. Please remove trailing spaces.
{code}
+      ConcurrentHashMap<String, String[]> backup = 
{code}

> Configuration is not thread-safe
> --------------------------------
>
>                 Key: HADOOP-11209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11209
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: conf
>            Reporter: Josh Rosen
>            Assignee: Varun Saxena
>         Attachments: HADOOP-11209.001.patch, HADOOP-11209.002.patch, 
> HADOOP-11209.003.patch
>
>
> {{Configuration}} objects are not fully thread-safe, which causes problems in 
> multi-threaded frameworks like Spark that use these configurations to 
> interact with existing Hadoop APIs (such as InputFormats).
> SPARK-2546 is an example of a problem caused by this lack of thread-safety.  
> In that bug, multiple concurrent modifications of the same Configuration (in 
> third-party code) caused an infinite loop because Configuration's internal 
> {{java.util.HashMap}} is not thread-safe.
> One workaround is for our code to clone Configuration objects; unfortunately, 
> this also suffers from thread-safety issues on older Hadoop versions because 
> Configuration's constructor wasn't thread-safe (HADOOP-10456).
> [Looking at a recent version of 
> Configuration.java|https://github.com/apache/hadoop/blob/d989ac04449dc33da5e2c32a7f24d59cc92de536/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L666],
>  it seems that the private {{updatingResource}} HashMap and 
> {{finalParameters}} HashSet fields the only non-thread-safe collections in 
> Configuration (Java's {{Properties}} class is thread-safe), so I don't think 
> that it would be hard to make Configuration fully thread-safe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to