Last-remote11 opened a new pull request, #6929:
URL: https://github.com/apache/hadoop/pull/6929
<!--
Thanks for sending a pull request!
1. If this is your first time, please read our contributor guidelines:
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
2. Make sure your PR title starts with JIRA issue id, e.g.,
'HADOOP-17799. Your PR title ...'.
-->
### Description of PR
The feature added
[HDFS-14617](https://issues.apache.org/jira/browse/HDFS-14617)(in Improve
FSImage load time by writing sub-sections to the FSImage index. by [Stephen
O'Donnell](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=sodonnell))
makes loading FSImage very faster.
But this option cannot be activated when turn on dfs.image.compress=true.
In my opinion, larger clusters require both settings at the same time.
In my environment, the cluster I'm using has approximately 6 million file
system objects and FSImage is approximately 11GB with dfs.image.compress=true
setting. If turn off the dfs.image.compress option, it is expected to exceed
30GB, in which case it will take a long time to move FSImage from standby to
active namenode using high network resource.
It was proved in this
jira([HDFS-16147](https://issues.apache.org/jira/browse/HDFS-16147) by
[kinit](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mofei))
that loading FSImage parallel and FSImage compression can be turned on at the
same time. (And worked well on my environment also.)
I created this new Jira and PR because the discussion in
[HDFS-16147](https://issues.apache.org/jira/browse/HDFS-16147) ended in 2021,
and I want it to be officially added in the next release, instead of patch
available.
The actual code of the patch was written by
[kinit](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mofei) and
I resolved empty sub-section problem(see below comment of
[HDFS-16147](https://issues.apache.org/jira/browse/HDFS-16147)) and added test
code.
If this is not a proper method, please let me know another way to contribute.
Thanks.
### How was this patch tested?
added `testParallelSaveAndLoadWithCompression` and ran unit test.
### For code changes:
- [ ] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [ ] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]