[ 
https://issues.apache.org/jira/browse/SPARK-33402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated SPARK-33402:
-----------------------------------
    Summary: Jobs launched in same second have duplicate MapReduce JobIDs  
(was: Jobs launched in same second have duplicate JobIDs)

> Jobs launched in same second have duplicate MapReduce JobIDs
> ------------------------------------------------------------
>
>                 Key: SPARK-33402
>                 URL: https://issues.apache.org/jira/browse/SPARK-33402
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.8, 3.0.1, 3.1.0
>            Reporter: Steve Loughran
>            Priority: Major
>
> Spark uses the current timestamp to generate a MapReduce JobID.
> If > 1 job attempt is generated in the same second, these can clash
> Committers which expect this to be unique can conflict with the other jobs
> * S3A staging committer (cluster FS staging dir and local task output dir)
> * Any committer which supports parallel jobs writing to the same destination
>   directory and requires unique names for the attempts
> * Code which uses the jobID as part of its algorithm to generate unique 
> filenames
> Note: {{HadoopMapReduceCommitProtocol.getFilename()}} doesn't use this JobID 
> for
> uniqueness, it uses task attempt ID and stage ID. It probably deserves its own
> audit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to