[PR] HDFS-17573. Add test code for FSImage parallelization and compression [hadoop]

via GitHub Mon, 08 Jul 2024 08:51:54 -0700


Last-remote11 opened a new pull request, #6929:
URL: https://github.com/apache/hadoop/pull/6929


   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   
   The feature added 
[HDFS-14617](https://issues.apache.org/jira/browse/HDFS-14617)(in Improve 
FSImage load time by writing sub-sections to the FSImage index. by [Stephen 
O'Donnell](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=sodonnell))
 makes loading FSImage very faster.
   
   
   
   
   But this option cannot be activated when turn on dfs.image.compress=true.
   
   In my opinion, larger clusters require both settings at the same time.
   
   In my environment, the cluster I'm using has approximately 6 million file 
system objects and FSImage is approximately 11GB with dfs.image.compress=true 
setting.  If turn off the dfs.image.compress option, it is expected to exceed 
30GB, in which case it will take a long time to move FSImage from standby to 
active namenode using high network resource.
   
   
   
   
   It was proved in this 
jira([HDFS-16147](https://issues.apache.org/jira/browse/HDFS-16147) by 
[kinit](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mofei)) 
that loading FSImage parallel and FSImage compression can be turned on at the 
same time.  (And worked well on my environment also.)
   
   I created this new Jira and PR because the discussion in 
[HDFS-16147](https://issues.apache.org/jira/browse/HDFS-16147) ended in 2021, 
and I want it to be officially added in the next release, instead of patch 
available.
   
   The actual code of the patch was written by 
[kinit](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mofei) and 
I resolved empty sub-section problem(see below comment of 
[HDFS-16147](https://issues.apache.org/jira/browse/HDFS-16147)) and added test 
code.
   
   If this is not a proper method, please let me know another way to contribute.
   
   Thanks.
   
   ### How was this patch tested?
   
   added `testParallelSaveAndLoadWithCompression` and ran unit test.
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] HDFS-17573. Add test code for FSImage parallelization and compression [hadoop]

Reply via email to