[ 
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165674#comment-17165674
 ] 

Stephen O'Donnell commented on HDFS-15493:
------------------------------------------

Hi [~smarthan]. Thanks for this patch. I think it is a good idea - I have some 
thoughts on things we should try, which might improve things further below.

Only one thread can update the Cache map at a time and one can update the Block 
Map due to locking. The calls to these methods already process a batch, so they 
can hold the lock for a relatively long time. With that in mind, I wonder if 
the default of 4 threads makes sense - only 2 can ever be active at any time, 
and I think it would be possible for all 4 threads to be attempting to update 
the cacheMap when none are updating the blockMap. That means 2 or 3 threads 
will always be blocked.

I think it would be would be worth testing two single threaded executor pools - 
one for the cacheMap and one for BlockMap and see if that performs the same or 
better - what do you think?

I am not sure if waiting only 1ms before failing would give enough time for the 
executor to complete pending tasks. It may be possible for there to be a lot of 
queued requests which take a few seconds to finish processing:

{code}
      if (blocksMapUpdateExecutor != null) {
        blocksMapUpdateExecutor.shutdown();
        Try {
          while (!blocksMapUpdateExecutor.isTerminated()) {
            blocksMapUpdateExecutor.awaitTermination(1, TimeUnit.MILLISECONDS);
          }
        } catch (InterruptedException e) {
          LOG.error("Interrupted waiting for blocksMap update threads.", e);
          throw new IOException(e);
        }
      }
{code}

We could wait 5 seconds, and if there is a timeout, log a warning, and then 
wait again, perhaps 10 times before failing? This would also let us know if the 
load iNodeDirectory Section is having to wait on the new background tasks 
before the next stage can start.

I would like to avoid the changes in FSImageFormatProtobuf.loadInternal() and 
passing all the null values to `inodeLoader.loadINodeDirectorySection(...)` if 
we can. I understand those changes are needed to shutdown the new executor. 
Therefore, lets wait and see how two single threaded executors work, and 
whether we need to wait on the thread pool to shutdown as that may influence 
how we shutdown the executors.

If there is a delay in the threadpools shutting down, then we could consider 
moving the `blocksMapUpdateExecutor.shutdown()` call into a 
Loader.shutdownExecutors() method which we call after loading all sections. 
Don't make this change until we see the what happens with the other experiments 
above.

Can I also ask:

1. Did you try HDFS-13693 and did it make any further speed improvement?

2. Could you try my suggestion with two single threaded executors and see what 
difference it makes to the runtime?

3. Would you be able to run a test with HDFS-14617 disabled to give us an idea 
of how much HDFS-14617 improves things on its own?

> Update block map and name cache in parallel while loading fsimage.
> ------------------------------------------------------------------
>
>                 Key: HDFS-15493
>                 URL: https://issues.apache.org/jira/browse/HDFS-15493
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Chengwei Wang
>            Priority: Major
>         Attachments: HDFS-15493.001.patch
>
>
> While loading INodeDirectorySection of fsimage, it will update name cache and 
> block map after added inode file to inode directory. It would reduce time 
> cost of fsimage loading to enable these steps run in parallel.
> In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
> fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost 
> reduc to 410s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to