SalvadorRomo opened a new pull request, #2547:
URL: https://github.com/apache/uniffle/pull/2547

   Title: [#2544][Bug] NPE about StatisticsCodec 
   
   ### What changes were proposed in this pull request?
   Converting into a synchronize `List<codeCost>`  
   
   ### Why are the changes needed?
   this bugs happened in concurrent environment when spark workers with the 
`RssShuffleManager` and  spark.rss.client.io.compression.statisticsEnabled 
property enabled, logs its compression statistics when finished, but since the 
class was not prepared for concurrent enviroment, at the time to call 
`List<codeCost>` into the `codec.statistics()` methods enters in a race 
condition.
   
   Fix: # 2544
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   this issue was diffcult to replicate  into a local enviroment, but instead  
when the patch was applied, i make sure the application continue working as 
usual by: 
   1- deploying the application based on  `./deploy/docker/read.me`  
   2- when executing the spark-shell make sure to do it by including the 
`spark.rss.client.io.compression.statisticsEnabled `  props as follow:
   ```
   
   docker exec -it rss-spark-master-1 /opt/spark/bin/spark-shell \
      --master spark://rss-spark-master-1:7077 \
      --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
      --conf spark.shuffle.manager=org.apache.spark.shuffle.RssShuffleManager \
      --conf 
spark.rss.coordinator.quorum=rss-coordinator-1:19999,rss-coordinator-2:19999 \
      --conf spark.rss.storage.type=MEMORY_LOCALFILE \
      --conf spark.speculation=true \
      --conf spark.rss.client.io.compression.statisticsEnabled=true
   ```
   
   3- run  multipe spark scala jobs 
   4-  when finishing, into each worker, look for the logs in 
`/opt/spark/work/...`
   5- looks for every entry in the file that succesfully logs the statistics. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to