[ 
https://issues.apache.org/jira/browse/IMPALA-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838930#comment-16838930
 ] 

ASF subversion and git services commented on IMPALA-8428:
---------------------------------------------------------

Commit 7188ad32e6e94e90cbee492f3ad3628855f70925 in impala's branch 
refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=7188ad3 ]

IMPALA-8428: Add support for caching file handles on s3

This patch is based on work done by Joe McDonnell. This change adds
support for cacheing file handles from S3. It add a new configuration
flag 'cache_s3_file_handles' (set to true by default) which controls
whether or not cacheing of S3 file handles is enabled.

The S3 file handle cache is dependent on HADOOP-14747 (S3AInputStream to
implement CanUnbuffer). HADOOP-14747 adds support for hdfsUnbufferFile
to S3A streams. The call to unbuffer closes the underlying S3 object
stream. Without this change the S3 file handle cache would quickly cause
an impalad to crash because all S3 file handles in the cache would have
a dangling HTTP(S) connection open to S3.

Testing:
* Modified test_hdfs_fd_caching.py so it is enabled for S3 as well as
remote HDFS
* Ran core tests
* Ran TPC-DS on a real cluster and validated that the S3 file handle
cache works as expected
* Ran several test queries on a real cluster with S3Guard enabled and
validated that the S3 file handle cache works as expected

Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19
Reviewed-on: http://gerrit.cloudera.org:8080/13221
Reviewed-by: Joe McDonnell <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Add support for caching file handles on s3
> ------------------------------------------
>
>                 Key: IMPALA-8428
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8428
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 3.3.0
>            Reporter: Joe McDonnell
>            Assignee: Sahil Takiar
>            Priority: Critical
>
> The file handle cache is currently disabled for S3, as the S3 connector 
> needed to implement proper unbuffer support. Now that 
> https://issues.apache.org/jira/browse/HADOOP-14747 is fixed, Impala should 
> provide an option to cache S3 file handles.
> This is particularly important for data caching, as accessing the data cache 
> happens after obtaining a file handle. If getting a file handle is slow, the 
> caching will be less effective.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to