errose28 commented on pull request #2757:
URL: https://github.com/apache/ozone/pull/2757#issuecomment-953513749


   Thanks for the write up @sky76093016 and great job identifying the cause. 
This is a tricky bug in some very confusing code. Your analysis was very 
helpful in building my understanding of what is happening here. Let me know 
what you think of this summary:
   
   The code referenced here is all within `PipelinePlacementPolicy`. The test 
in question calls `chooseDatanodes`, which calls `filterViableNodes`. 
`filterViableNodes` returns the nodes sorted based on pipeline load. Given this 
sorted list, `filterViableNodes` then eventually calls `chooseNode` whether or 
not topology information is present (PipelinePlacementPolicy lines 240-249). 
The comments on these lines indicate two cases:
   
   **case 1**: If there is no topology information, choose a random node.
   **case 2**: If we have topology information, choose the anchor node as the 
one with lowest pipeline load (this would be the first node in the list 
returned by `filterViableNodes`), and choose the other two around it based on 
topology. 
   
   Pre HDDS-4710, `chooseNode` was always returning the first element in the 
list it was passed. This failed **case 1**, because it would always choose the 
same node in the absence of topology information. However, it passed **case 
2**, because the list passed to `chooseNode` was already sorted by pipeline 
usage, so returning the first element gave correct behavior.
   
   HDDS-4710 intended to fix **case 1** and leave **case 2** functioning 
correctly. In the PR description, it says
   > The other usages of chooseNode in class PipelinePlacementPolicy won't 
affect the topology & rack-awareness based node-choose policy
   
   Unfortunately the PR did not actually do this, because the same `chooseNode` 
method is eventually called for both cases, so it fixed **case 1** and broke 
**case 2** by choosing a random node when it was supposed to choose the node 
with lowest load.
   
   This test calculates the maximum pipelines that should be able to be created 
given the nodes and pipeline limit per node. Most of the time, the random 
selection actually picks optimal pipeline placements so the test passes. 
However, sometimes the random selection makes a suboptimal choice as you showed 
in your write up. This is why we see the intermittent failure. If we were 
choosing the anchor based on pipeline load, we would be able to place all the 
pipelines every time.
   
   The current fix in the PR works by reverting **case 2** to its original 
function. Since **case 2** is the only one used in this test, it now passes 
like before. **case 1** might be satisfied by this fix, because when the nodes 
are equal, passing the results of the shuffle into the sort on the following 
lines *might* result in a different order every time, depending on the sort 
implementation. I think a better fix is to just do what the comments in 
`PipelinePlacementPolicy` line 240 say. We can leave `chooseNode` unaltered so 
when `SCMCommonPlacementPolicy#getResultSet` calls it, it will return a random 
node as expected. In the overridden `PipelinePlacementPolicy#getResultSet`, we 
cannot call `chooseNode`. Instead, we can make a new method that does almost 
the same thing, but returns the first node in the list instead of a random one. 
Now **case 1** and **case 2** should be satisfied. Let me know what you think 
of this solution.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to