Re: [PR] [#1512] feature(Spark): Replace taskAttemptId with mapIndex in blockId [incubator-uniffle]

via GitHub Thu, 15 Feb 2024 20:35:06 -0800


jerqi commented on PR #1514:
URL: 
https://github.com/apache/incubator-uniffle/pull/1514#issuecomment-1947743242


   > > We can use TaskSchedulerImpl#nextTaskId to simplify the taskAttemptId.
   > > 
   > > * We use Java reflection to get TaskSchedulerImpl#nextTaskId from 
SparkContext.
   > > * We record the start taskAttemptId in the ShuffleHandle.
   > > * We use taskAttemptId minus start taskAttemptId.
   > 
   > Offsetting the taskAttemptId would help to reduce the ids for long running 
Spark jobs, but the offset ids can practically still be arbitrary large, e.g. 
if there are many consecutive large (with many partitions) stages.
   > 
   > What do you think about using the task id and the attempt no 
(bit-appended)? That id is unique for a stage and much more constraint than the 
taskAttemptId or the offset taskAttemptId: #1529
   > 
   > > > @jerqi Do you mean the RSS Spark clients or Apache Spark? Can you 
point me to it? What do you suggest to do next?
   > > 
   > > 
   > > I mean Apache Spark has own commit mechinanism.
   > 
   > Do you refer to 
[ShuffleMapOutputWriter](https://spark.apache.org/docs/latest/api/java/org/apache/spark/shuffle/api/ShuffleMapOutputWriter.html)?
   
   AttemptNo will waste some bits. If we increase the bits, the bitmap will 
occupy more memory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [#1512] feature(Spark): Replace taskAttemptId with mapIndex in blockId [incubator-uniffle]

Reply via email to