zhli1142015 commented on issue #6995: URL: https://github.com/apache/incubator-gluten/issues/6995#issuecomment-2306881169
Thanks for reporting this, may I ask some questions about this? > Currently, we monitor TaskEnd/StageEnd events to update and clean up FilePartition and Executor data for duplicate reading. Does this mean you use partial logic for SA duplicate reading detection? Or you don't use SA at all, SA just impacts you. > When the events eventually exceed the capacity defined by Spark and they will be discarded. Yes, if some key events(SparkListenerStageCompleted) are discarded, the middle states would not be cleared, this should be a bug. I can take a look. > @zhli1142015 We may need to refact the code for duplicate reading or close the duplicated reading by default. Here is the config to disable the duplicate reading: spark.gluten.soft-affinity.duplicateReadingDetect.enabled -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
