Tim Armstrong created IMPALA-8522:
-------------------------------------
Summary: Consider being smarter about deleting insert staging
directories upon errors
Key: IMPALA-8522
URL: https://issues.apache.org/jira/browse/IMPALA-8522
Project: IMPALA
Issue Type: Improvement
Components: Distributed Exec
Affects Versions: Impala 3.2.0, Impala 3.1.0, Impala 2.12.0, Impala 3.0,
Impala 2.11.0
Reporter: Tim Armstrong
My investigation into IMPALA-7176 showed that someones hdfsClose() can be very
slow if the file was deleted out from under the client.
Impala actually does this on the error cleanup path for inserts -
Coordinator::FinalizeHdfsInsert() will delete the whole staging directory while
fragments may still be running. This isn't a bad fallback in case a fragment
goes missing (e.g. a node crash) or gets stuck permanently, but leads to
unnecessary errors otherwise. It seems like we could wait a few seconds if the
fragments are still running to give the fragments a chance to clean up after
themselves.
Maybe this isn't worth the hassle or it could be fixed on the HDFS side - I
filed HDFS-14479
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]