cnauroth opened a new pull request, #3722: URL: https://github.com/apache/hive/pull/3722
### What changes were proposed in this pull request? Improve logging in `DagUtils#localizeResource` to clarify root cause for why localizing a resource has failed. ### Why are the changes needed? While creating a Tez session, `DagUtils#localizeResource` is responsible for copying the client's hive-exec.jar into HDFS (`hive.jar.directory`). This process can be triggered from multiple threads concurrently, in which case one thread performs the copy while the others wait, polling for arrival of the destination file. If there is an `IOException` during this process, it's assumed that the thread attempting the write failed, and all others abort. No information about the underlying `IOException` is logged. Instead, the log states "previous writer likely failed to write." In some cases though, the `IOException` can occur on a polling thread for reasons unrelated to what happened in a writing thread. For example, in a production incident, the root cause was really that an external process had corrupted the copy of hive-exec.jar in `hive.jar.directory`, causing failure of the file length validation check in `DagUtils#checkPreExisting`. Since the logs didn't say anything about this, it made it much more difficult to troubleshoot. This patch clarifies the logging by stating that a failure on the writing thread is just one possible reason for the error. It also logs the exception stack trace to make it easier to find the real root cause. This is a patch I ran to help recover from the production incident. ### Does this PR introduce _any_ user-facing change? There is no behavior change, but it does change the logging output. ### How was this patch tested? This patch was deployed as part of resolving the production incident that I mentioned. I was also able to create a reproduction in a test environment by externally overwriting the hive-exec.jar in `hive.jar.directory` to simulate the production incident. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
