[ https://issues.apache.org/jira/browse/HDFS-17573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863842#comment-17863842 ]
ASF GitHub Bot commented on HDFS-17573: --------------------------------------- Last-remote11 opened a new pull request, #6929: URL: https://github.com/apache/hadoop/pull/6929 <!-- Thanks for sending a pull request! 1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute 2. Make sure your PR title starts with JIRA issue id, e.g., 'HADOOP-17799. Your PR title ...'. --> ### Description of PR The feature added [HDFS-14617](https://issues.apache.org/jira/browse/HDFS-14617)(in Improve FSImage load time by writing sub-sections to the FSImage index. by [Stephen O'Donnell](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=sodonnell)) makes loading FSImage very faster. But this option cannot be activated when turn on dfs.image.compress=true. In my opinion, larger clusters require both settings at the same time. In my environment, the cluster I'm using has approximately 6 million file system objects and FSImage is approximately 11GB with dfs.image.compress=true setting. If turn off the dfs.image.compress option, it is expected to exceed 30GB, in which case it will take a long time to move FSImage from standby to active namenode using high network resource. It was proved in this jira([HDFS-16147](https://issues.apache.org/jira/browse/HDFS-16147) by [kinit](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mofei)) that loading FSImage parallel and FSImage compression can be turned on at the same time. (And worked well on my environment also.) I created this new Jira and PR because the discussion in [HDFS-16147](https://issues.apache.org/jira/browse/HDFS-16147) ended in 2021, and I want it to be officially added in the next release, instead of patch available. The actual code of the patch was written by [kinit](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mofei) and I resolved empty sub-section problem(see below comment of [HDFS-16147](https://issues.apache.org/jira/browse/HDFS-16147)) and added test code. If this is not a proper method, please let me know another way to contribute. Thanks. ### How was this patch tested? added `testParallelSaveAndLoadWithCompression` and ran unit test. ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > Add test code for FSImage parallelization and compression > --------------------------------------------------------- > > Key: HDFS-17573 > URL: https://issues.apache.org/jira/browse/HDFS-17573 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode > Affects Versions: 3.4.1 > Reporter: Sungdong Kim > Priority: Minor > Fix For: 3.4.1 > > > The feature added HDFS-14617(in Improve FSImage load time by writing > sub-sections to the FSImage index. by [Stephen > O'Donnell|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=sodonnell]) > makes loading FSImage very faster. > > But this option cannot be activated when turn on dfs.image.compress=true. > In my opinion, larger clusters require both settings at the same time. > For Example, the cluster I'm using has approximately 6 million file system > objects and FSImage is approximately 11GB with dfs.image.compress=true > setting. > If turn off the dfs.image.compress option, it is expected to exceed 30GB, in > which case it will take a long time to move FSImage from standby to active > namenode using high network resource. > > It was proved in this jira(HDFS-16147 by > [kinit|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mofei]) > that loading FSImage parallel and FSImage compression can be turned on at the > same time. (And worked well on my environment also.) > I created this new Jira and PR because the discussion in HDFS-16147 ended in > 2021, and I want it to be officially added in the next release, instead of > patch available. > The actual code of the patch was written by > [kinit|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mofei] and > I resolved empty sub-section problem(see below comment of HDFS-16147) and > added test code. > If this is not a proper method, please let me know another way to contribute. > Thanks. > > (pull request will be attached.) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org