zuston opened a new pull request, #2669:
URL: https://github.com/apache/uniffle/pull/2669

   ### What changes were proposed in this pull request?
   
   This is the part-1 PR only with uniffle client changes of making the 
partition stats stored in the shuffle-server side to make the integrity 
validation mechanism more stable. BTW, the shuffle-servers side changes will be 
implemented in the further PRs.
   
   ### Why are the changes needed?
   
   By leveraging the PR #2653 , we could end-to-end ensure the data 
consistency. But, the partition stats stored in the spark driver side, for the 
normal spark stages, this design runs well. But with the 100000 tasks with 
10000 partitions, this will make the Spark driver overload. From the point of 
cluster spark jobs, some huge jobs will hang when getting the blockManagerIds, 
that will cost almost 20mins for one reader task, that is unacceptable. 
   
   And so, this PR implements the server side store the partition stats like 
the blockID store did.
   
   ### Does this PR introduce _any_ user-facing change?
   
   `spark.rss.client.integrityValidation.serverManagementEnabled=false`
   
   ### How was this patch tested?
   
   Internal job tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to