[
https://issues.apache.org/jira/browse/HDFS-17210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772618#comment-17772618
]
ASF GitHub Bot commented on HDFS-17210:
---------------------------------------
GuoPhilipse commented on PR #6113:
URL: https://github.com/apache/hadoop/pull/6113#issuecomment-1750939980
> I think you want to reduce the probability of nodes with extremely high
storage usage being selected, which is very useful. But the current changes are
confusing me a bit, so I would like to discuss it with you. The meaning of
`toleranceLimit` is not very intuitive to me. The logic here couples the effect
of `balancedSpaceToleranceLimit` to the configuration value of
`balancedSpaceTolerance`. For example, if we configure `balancedSpaceTolerance`
to 10, nodeA's usage is 99%, and nodeB's usage is 90%, then even if
`balancedSpaceToleranceLimit` is configured to 93, nodeA will still be selected
without any doubt. I think this is not what we expected. I suggest that we can
directly use nodes with greater usage to make judgments. At the same time, we
can make the meaning of `toleranceLimit` more intuitive and no longer negate it
when using it. The changes are as follows: line215: `boolean toleranceLimit =
Math.max(a.getDfsUsedPercent(), b.getDfsUsedPercent()) <
balancedSpaceToleranceLimit;` Line 217: `if (a.equals(b) || (toleranceLimit &&
Math.abs(a.getDfsUsedPercent() - b.getDfsUsedPercent()) <
balancedSpaceTolerance) || ((isBalanceLocal && a.getDfsUsedPercent() < 50))) `
Please feel free to disagree. Looking forward to your reply.
Thanks @zhangshuyan0 for you review, It seems using min(A,B) only cover when
all node's usage are over balancedSpaceToleranceLimit, which does not fit for
lower node usage and higher node usage, so your suggest changes makes sense for
me.
> Optimize AvailableSpaceBlockPlacementPolicy
> -------------------------------------------
>
> Key: HDFS-17210
> URL: https://issues.apache.org/jira/browse/HDFS-17210
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs
> Affects Versions: 3.3.6
> Reporter: Fei Guo
> Priority: Minor
> Labels: pull-request-available
>
> for now ,we may have many nodes usage over 85%, some nodes'usage have been
> over 90%, if we choose three nodes as nodeA-97%,nodeB-98%,nodeC-99%, actually
> i don't want nodeC(99%) be choosen, for nodeC usage is really high , even
> random chance ,nodeC will reach 100% soon, so we can directly choose the less
> usage node(nodeA) if all nodes's usage are over 95%(just a example)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]