steveloughran commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724786879
some more detail for the watchers from my testing (hadoop-trunk + CDP spark 2.4). I could not get spark master and hadoop trunk to build together this week. * RDD.saveAs needs to pass down the setting too [https://issues.apache.org/jira/browse/SPARK-33402](https://issues.apache.org/jira/browse/SPARK-33402) * I'm getting errors with FileSystem instantiation in Hive and the isolated classloader [https://issues.apache.org/jira/browse/HADOOP-17372](https://issues.apache.org/jira/browse/HADOOP-17372). I'm not going near that other than to add a para in troubleshooting.md saying "you're in classloader hell". Will need to be testing against spark master before worrying about WTF is going on there I'm also now worried that if anyone does >1 job with the same dest dir and overwrite=true, then there's a risk that you get the same duplicate app attempt ID race condition. It's tempting just to do something ambitious like use a random number to generate a timestamp for the cluster launch, or some random(year-month-day)+ seconds-of-day, so that this problem goes away almost completely ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
