[jira] [Commented] (HDFS-17573) Add test code for FSImage parallelization and compression

ASF GitHub Bot (Jira) Mon, 08 Jul 2024 08:52:04 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-17573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863842#comment-17863842
 ]


ASF GitHub Bot commented on HDFS-17573:
---------------------------------------

Last-remote11 opened a new pull request, #6929:
URL: https://github.com/apache/hadoop/pull/6929

   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   
   The feature added 
[HDFS-14617](https://issues.apache.org/jira/browse/HDFS-14617)(in Improve 
FSImage load time by writing sub-sections to the FSImage index. by [Stephen 
O'Donnell](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=sodonnell))
 makes loading FSImage very faster.
   
   
   
   
   But this option cannot be activated when turn on dfs.image.compress=true.
   
   In my opinion, larger clusters require both settings at the same time.
   
   In my environment, the cluster I'm using has approximately 6 million file 
system objects and FSImage is approximately 11GB with dfs.image.compress=true 
setting.  If turn off the dfs.image.compress option, it is expected to exceed 
30GB, in which case it will take a long time to move FSImage from standby to 
active namenode using high network resource.
   
   
   
   
   It was proved in this 
jira([HDFS-16147](https://issues.apache.org/jira/browse/HDFS-16147) by 
[kinit](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mofei)) 
that loading FSImage parallel and FSImage compression can be turned on at the 
same time.  (And worked well on my environment also.)
   
   I created this new Jira and PR because the discussion in 
[HDFS-16147](https://issues.apache.org/jira/browse/HDFS-16147) ended in 2021, 
and I want it to be officially added in the next release, instead of patch 
available.
   
   The actual code of the patch was written by 
[kinit](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mofei) and 
I resolved empty sub-section problem(see below comment of 
[HDFS-16147](https://issues.apache.org/jira/browse/HDFS-16147)) and added test 
code.
   
   If this is not a proper method, please let me know another way to contribute.
   
   Thanks.
   
   ### How was this patch tested?
   
   added `testParallelSaveAndLoadWithCompression` and ran unit test.
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> Add test code for FSImage parallelization and compression
> ---------------------------------------------------------
>
>                 Key: HDFS-17573
>                 URL: https://issues.apache.org/jira/browse/HDFS-17573
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs, namenode
>    Affects Versions: 3.4.1
>            Reporter: Sungdong Kim
>            Priority: Minor
>             Fix For: 3.4.1
>
>
> The feature added HDFS-14617(in Improve FSImage load time by writing 
> sub-sections to the FSImage index. by [Stephen 
> O'Donnell|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=sodonnell])
>  makes loading FSImage very faster.
>  
> But this option cannot be activated when turn on dfs.image.compress=true.
> In my opinion, larger clusters require both settings at the same time.
> For Example, the cluster I'm using has approximately 6 million file system 
> objects and FSImage is approximately 11GB with dfs.image.compress=true 
> setting.
> If turn off the dfs.image.compress option, it is expected to exceed 30GB, in 
> which case it will take a long time to move FSImage from standby to active 
> namenode using high network resource.
>  
> It was proved in this jira(HDFS-16147 by 
> [kinit|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mofei]) 
> that loading FSImage parallel and FSImage compression can be turned on at the 
> same time.  (And worked well on my environment also.)
> I created this new Jira and PR because the discussion in HDFS-16147 ended in 
> 2021, and I want it to be officially added in the next release, instead of 
> patch available.
> The actual code of the patch was written by 
> [kinit|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mofei] and 
> I resolved empty sub-section problem(see below comment of HDFS-16147) and 
> added test code.
> If this is not a proper method, please let me know another way to contribute.
> Thanks.
>  
> (pull request will be attached.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17573) Add test code for FSImage parallelization and compression

Reply via email to