leosunli commented on a change in pull request #1010: HDFS-13694. Making md5
computing being in parallel with image loading.
URL: https://github.com/apache/hadoop/pull/1010#discussion_r298063529
##########
File path:
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
##########
@@ -172,13 +172,55 @@ public LoaderContext getLoaderContext() {
return ctx;
}
+ /***
+ * a thread for parallel MD5 computing to increase performance when loading
+ */
+ private static class DigestThread extends Thread {
+ volatile private IOException ioe = null;
+ volatile private MD5Hash digest = null;
+ private File file;
+
+ public DigestThread(File inFile) {
+ file = inFile;
+ }
+
+ public MD5Hash getDigest() {
+ return digest;
+ }
+
+ public IOException getException() {
+ return ioe;
+ }
+
+ @Override
+ public void run() {
+ try {
+ digest = MD5FileUtils.computeMd5ForFile(file);
+ } catch (IOException e) {
+ ioe = e;
+ } catch (Throwable t) {
+ ioe = new IOException(t);
+ }
+ }
+ }
+
void load(File file) throws IOException {
long start = Time.monotonicNow();
- imgDigest = MD5FileUtils.computeMd5ForFile(file);
+ DigestThread dt = new DigestThread(file);
Review comment:
Test the patch against a fsimage of a 70PB cluster (200million files and
300million blocks), the image loading time be reduced from 1210 seconds to 1105
seconds.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]