[GitHub] [hadoop] steveloughran commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

GitBox Tue, 10 Nov 2020 07:46:26 -0800


steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724786879



   some more detail for the watchers from my testing (hadoop-trunk + CDP spark 
2.4). I could not get spark master and hadoop trunk to build together this week.
   
   * RDD.saveAs needs to pass down the setting too 
[https://issues.apache.org/jira/browse/SPARK-33402](https://issues.apache.org/jira/browse/SPARK-33402)
   * I'm getting errors with FileSystem instantiation in Hive and the isolated 
classloader 
[https://issues.apache.org/jira/browse/HADOOP-17372](https://issues.apache.org/jira/browse/HADOOP-17372).
 
   
   I'm not going near that other than to add a para in troubleshooting.md 
saying "you're in classloader hell". Will need to be testing against spark 
master before worrying about WTF is going on there
   
   I'm also now worried that if anyone does >1 job with the same dest dir and 
overwrite=true, then there's a risk that you get the same duplicate app attempt 
ID race condition. It's tempting just to do something ambitious like use a 
random number to generate a timestamp for the cluster launch, or some 
random(year-month-day)+ seconds-of-day, so that this problem goes away almost 
completely


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hadoop] steveloughran commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Reply via email to