[
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165674#comment-17165674
]
Stephen O'Donnell commented on HDFS-15493:
------------------------------------------
Hi [~smarthan]. Thanks for this patch. I think it is a good idea - I have some
thoughts on things we should try, which might improve things further below.
Only one thread can update the Cache map at a time and one can update the Block
Map due to locking. The calls to these methods already process a batch, so they
can hold the lock for a relatively long time. With that in mind, I wonder if
the default of 4 threads makes sense - only 2 can ever be active at any time,
and I think it would be possible for all 4 threads to be attempting to update
the cacheMap when none are updating the blockMap. That means 2 or 3 threads
will always be blocked.
I think it would be would be worth testing two single threaded executor pools -
one for the cacheMap and one for BlockMap and see if that performs the same or
better - what do you think?
I am not sure if waiting only 1ms before failing would give enough time for the
executor to complete pending tasks. It may be possible for there to be a lot of
queued requests which take a few seconds to finish processing:
{code}
if (blocksMapUpdateExecutor != null) {
blocksMapUpdateExecutor.shutdown();
Try {
while (!blocksMapUpdateExecutor.isTerminated()) {
blocksMapUpdateExecutor.awaitTermination(1, TimeUnit.MILLISECONDS);
}
} catch (InterruptedException e) {
LOG.error("Interrupted waiting for blocksMap update threads.", e);
throw new IOException(e);
}
}
{code}
We could wait 5 seconds, and if there is a timeout, log a warning, and then
wait again, perhaps 10 times before failing? This would also let us know if the
load iNodeDirectory Section is having to wait on the new background tasks
before the next stage can start.
I would like to avoid the changes in FSImageFormatProtobuf.loadInternal() and
passing all the null values to `inodeLoader.loadINodeDirectorySection(...)` if
we can. I understand those changes are needed to shutdown the new executor.
Therefore, lets wait and see how two single threaded executors work, and
whether we need to wait on the thread pool to shutdown as that may influence
how we shutdown the executors.
If there is a delay in the threadpools shutting down, then we could consider
moving the `blocksMapUpdateExecutor.shutdown()` call into a
Loader.shutdownExecutors() method which we call after loading all sections.
Don't make this change until we see the what happens with the other experiments
above.
Can I also ask:
1. Did you try HDFS-13693 and did it make any further speed improvement?
2. Could you try my suggestion with two single threaded executors and see what
difference it makes to the runtime?
3. Would you be able to run a test with HDFS-14617 disabled to give us an idea
of how much HDFS-14617 improves things on its own?
> Update block map and name cache in parallel while loading fsimage.
> ------------------------------------------------------------------
>
> Key: HDFS-15493
> URL: https://issues.apache.org/jira/browse/HDFS-15493
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Reporter: Chengwei Wang
> Priority: Major
> Attachments: HDFS-15493.001.patch
>
>
> While loading INodeDirectorySection of fsimage, it will update name cache and
> block map after added inode file to inode directory. It would reduce time
> cost of fsimage loading to enable these steps run in parallel.
> In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load
> fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost
> reduc to 410s.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]