Re: [PR] [#1512] feature(Spark): Replace taskAttemptId with mapIndex in blockId [incubator-uniffle]

via GitHub Thu, 15 Feb 2024 01:01:00 -0800


EnricoMi commented on PR #1514:
URL: 
https://github.com/apache/incubator-uniffle/pull/1514#issuecomment-1945642831


   > We can use TaskSchedulerImpl#nextTaskId to simplify the taskAttemptId.
   > 
   > - We use Java reflection to get TaskSchedulerImpl#nextTaskId from 
SparkContext.
   > - We record the start taskAttemptId in the ShuffleHandle.
   > - We use taskAttemptId minus start taskAttemptId.
   
   Offsetting the taskAttemptId would help to reduce the ids for long running 
Spark jobs, but the offset ids can practically still be arbitrary large, e.g. 
if there are many consecutive large (with many partitions) stages.
   
   What do you think about using the task id and the attempt no (bit-appended)? 
That id is unique for a stage and much more constraint than the taskAttemptId 
or the offset taskAttemptId.
   
   > > @jerqi Do you mean the RSS Spark clients or Apache Spark? Can you point 
me to it? What do you suggest to do next?
   > 
   > I mean Apache Spark has own commit mechinanism.
   
   Do you refer to 
[ShuffleMapOutputWriter](https://spark.apache.org/docs/latest/api/java/org/apache/spark/shuffle/api/ShuffleMapOutputWriter.html)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [#1512] feature(Spark): Replace taskAttemptId with mapIndex in blockId [incubator-uniffle]

Reply via email to