adoroszlai opened a new pull request, #3511:
URL: https://github.com/apache/ozone/pull/3511

   ## What changes were proposed in this pull request?
   
   Attempt to fix mismatch in count of nodes selected by 
`SCMContainerPlacementRackScatter`:
   
   ```
   SCMException: Nodes size= 5, replication factor= 6 do not match
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineFactory.checkPipeline(PipelineFactory.java:106)
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineFactory.create(PipelineFactory.java:90)
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl.createPipeline(PipelineManagerImpl.java:195)
        at 
org.apache.hadoop.hdds.scm.pipeline.WritableECContainerProvider.allocateContainer(WritableECContainerProvider.java:172)
        at 
org.apache.hadoop.hdds.scm.pipeline.WritableECContainerProvider.getContainer(WritableECContainerProvider.java:155)
        at 
org.apache.hadoop.hdds.scm.pipeline.WritableECContainerProvider.getContainer(WritableECContainerProvider.java:51)
        at 
org.apache.hadoop.hdds.scm.pipeline.WritableContainerFactory.getContainer(WritableContainerFactory.java:59)
        at 
org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:176)
   ```
   
   The exception comes from `PipelineFactory.checkPipeline`, which implies that 
the placement policy itself did not find any problem with the number of nodes 
selected:
   
   
https://github.com/apache/ozone/blob/3d623a8dd337b507558d5b8fab745f3b420d7b87/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java#L230-L240
   
   Placement policy returns a list, which allows duplicates, but the pipeline 
stores nodes as keys in a map, thus discarding duplicates.  Hence node count in 
the two places may be different.
   
   
https://github.com/apache/ozone/blob/3d623a8dd337b507558d5b8fab745f3b420d7b87/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/pipeline/Pipeline.java#L508-L510
   
   This PR changes `SCMContainerPlacementRackScatter` to use a set for 
collecting selected nodes, only at the last step is it converted to list.  This 
way the algorithm exits only when required number of distinct nodes are found, 
or throws exception if that's not possible.
   
   https://issues.apache.org/jira/browse/HDDS-6830
   
   ## How was this patch tested?
   
   Existing tests pass.  I couldn't reproduce the duplicate node scenario, so 
no new test added.
   
   https://github.com/adoroszlai/hadoop-ozone/actions/runs/2488488385


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to