Todd Lipcon created IMPALA-8569:
-----------------------------------
Summary: Periodically scrub deleted files from the file handle
cache
Key: IMPALA-8569
URL: https://issues.apache.org/jira/browse/IMPALA-8569
Project: IMPALA
Issue Type: Improvement
Components: Backend
Reporter: Todd Lipcon
Currently, if you query a file, and then later delete that file (eg drop the
partition or table), the file will still stay in the impalad's file handle
cache. Because the file is open, the space can't be reclaimed on disk until the
impalad restarts or churns through its cache enough to drop the handle.
Typically this isn't a big deal in practice, since most files don't get deleted
shortly after being read, and the FH cache should cycle through after 6 hours
by default. Additionally, fixing it would be a bit of a pain since we'd need to
add HDFS and libhdfs hooks to get HDFS to tell us if the underlying short
circuit FD is unlinked, which probably also means adding JNI code to let Java
call to fstat() in order to check st_nlink. Given that, I'm not sure it's worth
fixing, or if we should just consider a shorter default expiry on the FH cache.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]