[ 
https://issues.apache.org/jira/browse/IMPALA-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8490.
-------------------------------
    Resolution: Fixed

> Impala Doc: the file handle cache now supports S3
> -------------------------------------------------
>
>                 Key: IMPALA-8490
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8490
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Docs
>            Reporter: Sahil Takiar
>            Assignee: Alex Rodoni
>            Priority: Major
>              Labels: in_33
>             Fix For: Impala 3.3.0
>
>
> https://impala.apache.org/docs/build/html/topics/impala_scalability.html 
> state:
> {quote}
> Because this feature only involves HDFS data files, it does not apply to 
> non-HDFS tables, such as Kudu or HBase tables, or tables that store their 
> data on cloud services such as S3 or ADLS.
> {quote}
> This section should be updated because the file handle cache now supports S3 
> files.
> We should add a section to the docs similar to what we added when support for 
> remote HDFS files was added to the file handle cache:
> {quote}
> In Impala 3.2 and higher, file handle caching also applies to remote HDFS 
> file handles. This is controlled by the cache_remote_file_handles flag for an 
> impalad. It is recommended that you use the default value of true as this 
> caching prevents your NameNode from overloading when your cluster has many 
> remote HDFS reads.
> {quote}
> Like {{cache_remote_file_handles}} the flag {{cache_s3_file_handles}} has 
> been added as an impalad startup option (the flag is enabled by default).
> Unlike HDFS though, S3 has no NameNode, the benefit is that it eliminate a 
> call to {{getFileStatus()}} on the target S3 file. So "prevents your NameNode 
> from overloading when your cluster has many remote HDFS reads" should be 
> changed to something like "avoids an unnecessary call to 
> S3AFileSystem#getFileStatus() which reduces the number of API calls made to 
> S3."



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to