[
https://issues.apache.org/jira/browse/IMPALA-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729924#comment-16729924
]
ASF subversion and git services commented on IMPALA-7265:
---------------------------------------------------------
Commit a3eb5fa90cf721e82a6f5d0aa7edf217be7ef3a1 in impala's branch
refs/heads/master from Joe McDonnell
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=a3eb5fa ]
IMPALA-7265: Add parameter to cache remote HDFS file handles
Currently, the file handle cache does not apply to remote HDFS
files. This adds a parameter 'cache_remote_file_handles' that
enables the file handle cache for remote HDFS files. It is
currently being tested, so it is set to false by default.
This does not change the behavior for S3, ADLS, or ABFS.
Change-Id: I549f007432a01ca52fa8093d458a220bba02e1d9
Reviewed-on: http://gerrit.cloudera.org:8080/12111
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Philip Zeyliger <[email protected]>
> Cache remote file handles
> -------------------------
>
> Key: IMPALA-7265
> URL: https://issues.apache.org/jira/browse/IMPALA-7265
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 3.1.0
> Reporter: Joe McDonnell
> Assignee: Joe McDonnell
> Priority: Critical
>
> The file handle cache currently does not allow caching remote file handles.
> This means that clusters that have a lot of remote reads can suffer from
> overloading the NameNode. Impala should be able to cache remote file handles.
> There are some open questions about remote file handles and whether they
> behave differently from local file handles. In particular:
> # Is there any resource constraint on the number of remote file handles
> open? (e.g. do they maintain a network connection?)
> # Are there any semantic differences in how remote file handles behave when
> files are deleted, overwritten, or appended?
> # Are there any extra failure cases for remote file handles? (i.e. if a
> machine goes down or a remote file handle is left open for an extended period
> of time)
> The form of caching will depend on the answers, but at the very least, it
> should be possible to cache a remote file handle at the level of a query so
> that a Parquet file with multiple columns can share file handles.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]