[jira] [Commented] (HDFS-13694) Making md5 computing being in parallel with image loading

He Xiaoqiao (JIRA) Fri, 28 Jun 2019 00:57:53 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-13694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874756#comment-16874756
 ]


He Xiaoqiao commented on HDFS-13694:
------------------------------------

Thanks [~leosun08] for your report and patch, it is very interesting 
improvement.
I found that you upload patch here and receive some comments from 
[~jojochuang], meanwhile submit another PR at GitHub, and [~elgoiri] has also 
given some other review comments there. Maybe there are some duplicate 
suggestions. IMO, we should focus on one side, I prefer to communicate here 
before GitHub repo is ready complete. As far as I know, only subproject ozone 
turn to GitHub for code reviews. [~elgoiri],[~jojochuang] Please give some 
suggestions if I am wrong.

Some minor comments for  [^HDFS-13694-005.patch],
a. is it expected to change Throwable to IOException, will it break something?
{code:java}
+      @Override
+      public void run() {
+        try {
+          digest = MD5FileUtils.computeMd5ForFile(file);
+        } catch (Throwable t) {
+          if (t instanceof IOException) {
+            ioe = (IOException) t;
+          } else {
+            ioe = new IOException(t);
+          }
+        }
+      }
{code}
b. do we need one configuration item to support switch this feature or not by 
default?
c. I believe this is great work, and will reduce restart time. thus I think it 
will be more friendly for watchers/reviewers if attach one simple benchmark 
test report.
d. It seems that patch based on branch-2.7, would you rebase and based on 
branch-trunk.
Thanks [~leosun08] for your great work again.

> Making md5 computing being in parallel with image loading
> ---------------------------------------------------------
>
>                 Key: HDFS-13694
>                 URL: https://issues.apache.org/jira/browse/HDFS-13694
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: zhouyingchao
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-13694-001.patch, HDFS-13694-002.patch, 
> HDFS-13694-003.patch, HDFS-13694-004.patch, HDFS-13694-005.patch
>
>
> During namenode image loading, it firstly compute the md5 and then load the 
> image. Actually these two steps can be in parallel.
>  Test this patch against a fsimage of a 70PB 2.4 cluster (200million files 
> and 300million blocks), the image loading time be reduced from 1210 seconds 
> to 1105 seconds.So it can reduce up to about 10% of time.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-13694) Making md5 computing being in parallel with image loading

Reply via email to