[ 
https://issues.apache.org/jira/browse/HADOOP-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931329#comment-15931329
 ] 

Manjunath Anand edited comment on HADOOP-13726 at 3/19/17 3:00 AM:
-------------------------------------------------------------------

Thanks [[email protected]] for praise in comments , I felt happy. Yes the 
array in catch block is to get around inside the closure and capture the 
exception.

Thanks [~cnauroth] for your inputs. Yes I agree that the computeIfAbsent is 
subjected to locking of non equal keys having same hashbucket and usually the 
chances of it happening are lesser when a large initial size is given to the 
ConcurrentHashMap and when the hash distribution is better but it can still be 
a problem.

After researching a little bit more on the challenge we have to pass the uri 
and conf to the load method, I stumbled upon 
[https://github.com/google/guava/wiki/CachesExplained#from-a-callable] and 
found a way to pass the URI and Conf to the Callable method without adding them 
to the Key class and am presenting the code below:-
{code}
private com.google.common.cache.Cache<Key, FileSystem> map = 
CacheBuilder.newBuilder().build();
    private FileSystem getInternal(final URI uri, final Configuration conf, Key 
key)
            throws IOException {
      /**
       * Calling getIfPresent to avoid unnecessary creation of Callable
       * object everytime getInternal is called.
       */
      FileSystem fs = map.getIfPresent(key);
      if(fs != null) {
        return fs;
      }
      if (map.size() == 0
              && !ShutdownHookManager.get().isShutdownInProgress()) {
        ShutdownHookManager.get().addShutdownHook(clientFinalizer, 
SHUTDOWN_HOOK_PRIORITY);
      }
      Callable<FileSystem> cb = new Callable<FileSystem>() {
          @Override
          public FileSystem call() throws Exception {
            FileSystem fs = createFileSystem(uri, conf);
            fs.key = key;
            if (conf.getBoolean(
                    FS_AUTOMATIC_CLOSE_KEY, FS_AUTOMATIC_CLOSE_DEFAULT)) {
              toAutoClose.add(key);
            }
            return fs;
          }
        };

      try {
        fs = map.get(key,cb);
      } catch (ExecutionException e) {
        LOG.error("Exception while creating file system for key "+key,e);
        if(e.getCause() instanceof IOException) throw (IOException)e.getCause();
      }

      return fs;
    }
{code}

Note that using guava Cache instead of java Map would mean additional code 
changes to existing method references such as entrySet, isEmpty, remove, keySet 
of java Map in the FileSystem and replacing it with corresponding methods from 
guava Cache.

Please let me know your thoughts about this approach and code. 


was (Author: manju_hadoop):
Thanks [[email protected]] for praise in comments , I felt happy. Yes the 
array in catch block is to get around inside the closure and capture the 
exception.

Thanks [~cnauroth] for your inputs. Yes I agree that the computeIfAbsent is 
subjected to locking of non equal keys having same hashbucket and usually the 
chances of it happening are lesser when a large initial size is given to the 
ConcurrentHashMap and when the hash distribution is better but it can still be 
a problem.

After researching a little bit more on the challenge we have to pass the uri 
and conf to the load method, I stumbled upon 
[https://github.com/google/guava/wiki/CachesExplained#from-a-callable] and 
found a way to pass the URI and Conf to the Callable method without adding them 
to the Key class and am presenting the code below:-
{code}
private com.google.common.cache.Cache<Key, FileSystem> map = 
CacheBuilder.newBuilder().build();
    private FileSystem getInternal(final URI uri, final Configuration conf, Key 
key)
            throws IOException {
      /**
       * Calling getIfPresent to avoid unnecessary creation of Callable
       * object everytime getInternal is called.
       */
      FileSystem fs = map.getIfPresent(key);
      if(fs != null) {
        return fs;
      }
      if (map.size() == 0
              && !ShutdownHookManager.get().isShutdownInProgress()) {
        ShutdownHookManager.get().addShutdownHook(clientFinalizer, 
SHUTDOWN_HOOK_PRIORITY);
      }
      Callable<FileSystem> cb = new Callable<FileSystem>() {
          @Override
          public FileSystem call() throws Exception {
            FileSystem fs = createFileSystem(uri, conf);
            if(fs!=null) throw new IOException();
            fs.key = key;
            if (conf.getBoolean(
                    FS_AUTOMATIC_CLOSE_KEY, FS_AUTOMATIC_CLOSE_DEFAULT)) {
              toAutoClose.add(key);
            }
            return fs;
          }
        };

      try {
        fs = map.get(key,cb);
      } catch (ExecutionException e) {
        LOG.error("Exception while creating file system for key "+key,e);
        if(e.getCause() instanceof IOException) throw (IOException)e.getCause();
      }

      return fs;
    }
{code}

Note that using guava Cache instead of java Map would mean additional code 
changes to existing method references such as entrySet, isEmpty, remove, keySet 
of java Map in the FileSystem and replacing it with corresponding methods from 
guava Cache.

Please let me know your thoughts about this approach and code. 

> Enforce that FileSystem initializes only a single instance of the requested 
> FileSystem.
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13726
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13726
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Nauroth
>            Assignee: Manjunath Anand
>
> The {{FileSystem}} cache is intended to guarantee reuse of instances by 
> multiple call sites or multiple threads.  The current implementation does 
> provide this guarantee, but there is a brief race condition window during 
> which multiple threads could perform redundant initialization.  If the file 
> system implementation has expensive initialization logic, then this is 
> wasteful.  This issue proposes to eliminate that race condition and guarantee 
> initialization of only a single instance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to