[ 
https://issues.apache.org/jira/browse/IMPALA-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-5352.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.10.0

commit 57dae5ec7e927a1c836f6bf0a1cbe5a81541327e
Author: Joe McDonnell <[email protected]>
Date:   Thu Jun 29 13:08:58 2017 -0700

    IMPALA-5352: Age out unused file handles from the cache
    
    Currently, a file handle in the file handle cache will
    only be evicted if the cache reaches its capacity. This
    means that file handles can be retained for an indefinite
    amount of time. This is true even for files that have
    been deleted, replaced, or modified. Since a file handle
    maintains a file descriptor for local files, this can
    prevent the disk space from being freed. Additionally,
    unused file handles are wasted memory.
    
    This adds code to evict file handles that have been
    unused for longer than a specified threshold. A thread
    periodically checks the file handle cache to see if
    any file handle should be evicted. The threshold is
    specified by 'unused_file_handle_timeout_sec'; it
    defaults to 6 hours.
    
    This adds a test to custom_cluster/test_hdfs_fd_caching.py
    to verify the eviction behavior.
    
    Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e
    Reviewed-on: http://gerrit.cloudera.org:8080/7640
    Reviewed-by: Matthew Jacobs <[email protected]>
    Tested-by: Impala Public Jenkins


> File handle cache needs timeout based eviction
> ----------------------------------------------
>
>                 Key: IMPALA-5352
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5352
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.9.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>             Fix For: Impala 2.10.0
>
>
> The file handle cache currently will keep file handles open indefinitely if 
> the cache is not at its maximum capacity. This means that file handles might 
> stay around for extended periods of time (weeks, months). Since local files 
> are accessed directly, an open file handle can prevent the disk blocks from 
> being freed, even if the file is deleted through HDFS. The file handle cache 
> should implement a timeout for file handles so that a file handle that is not 
> used recently will be evicted. This limit should be configurable and it may 
> be desirable for the default to take into account HDFS's fs.trash.interval.
> Additionally, when files are replaced or appended, the file's mtime will 
> increase. File handles with the old mtime will no longer be accessed, but 
> they may not be aged out of the cache. These should be aged out more 
> aggressively.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to