adoroszlai opened a new pull request, #3511:
URL: https://github.com/apache/ozone/pull/3511
## What changes were proposed in this pull request?
Attempt to fix mismatch in count of nodes selected by
`SCMContainerPlacementRackScatter`:
```
SCMException: Nodes size= 5, replication factor= 6 do not match
at
org.apache.hadoop.hdds.scm.pipeline.PipelineFactory.checkPipeline(PipelineFactory.java:106)
at
org.apache.hadoop.hdds.scm.pipeline.PipelineFactory.create(PipelineFactory.java:90)
at
org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl.createPipeline(PipelineManagerImpl.java:195)
at
org.apache.hadoop.hdds.scm.pipeline.WritableECContainerProvider.allocateContainer(WritableECContainerProvider.java:172)
at
org.apache.hadoop.hdds.scm.pipeline.WritableECContainerProvider.getContainer(WritableECContainerProvider.java:155)
at
org.apache.hadoop.hdds.scm.pipeline.WritableECContainerProvider.getContainer(WritableECContainerProvider.java:51)
at
org.apache.hadoop.hdds.scm.pipeline.WritableContainerFactory.getContainer(WritableContainerFactory.java:59)
at
org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:176)
```
The exception comes from `PipelineFactory.checkPipeline`, which implies that
the placement policy itself did not find any problem with the number of nodes
selected:
https://github.com/apache/ozone/blob/3d623a8dd337b507558d5b8fab745f3b420d7b87/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java#L230-L240
Placement policy returns a list, which allows duplicates, but the pipeline
stores nodes as keys in a map, thus discarding duplicates. Hence node count in
the two places may be different.
https://github.com/apache/ozone/blob/3d623a8dd337b507558d5b8fab745f3b420d7b87/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/pipeline/Pipeline.java#L508-L510
This PR changes `SCMContainerPlacementRackScatter` to use a set for
collecting selected nodes, only at the last step is it converted to list. This
way the algorithm exits only when required number of distinct nodes are found,
or throws exception if that's not possible.
https://issues.apache.org/jira/browse/HDDS-6830
## How was this patch tested?
Existing tests pass. I couldn't reproduce the duplicate node scenario, so
no new test added.
https://github.com/adoroszlai/hadoop-ozone/actions/runs/2488488385
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]