Tim Armstrong created IMPALA-8522:
-------------------------------------

             Summary: Consider being smarter about deleting insert staging 
directories upon errors
                 Key: IMPALA-8522
                 URL: https://issues.apache.org/jira/browse/IMPALA-8522
             Project: IMPALA
          Issue Type: Improvement
          Components: Distributed Exec
    Affects Versions: Impala 3.2.0, Impala 3.1.0, Impala 2.12.0, Impala 3.0, 
Impala 2.11.0
            Reporter: Tim Armstrong


My investigation into IMPALA-7176 showed that someones hdfsClose() can be very 
slow if the file was deleted out from under the client.

Impala actually does this on the error cleanup path for inserts - 
Coordinator::FinalizeHdfsInsert() will delete the whole staging directory while 
fragments may still be running. This isn't a bad fallback in case a fragment 
goes missing (e.g. a node crash) or gets stuck permanently, but leads to 
unnecessary errors otherwise. It seems like we could wait a few seconds if the 
fragments are still running to give the fragments a chance to clean up after 
themselves.

Maybe this isn't worth the hassle or it could be fixed on the HDFS side - I 
filed HDFS-14479



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to