[ https://issues.apache.org/jira/browse/IMPALA-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alex Rodoni closed IMPALA-8490. ------------------------------- Resolution: Fixed > Impala Doc: the file handle cache now supports S3 > ------------------------------------------------- > > Key: IMPALA-8490 > URL: https://issues.apache.org/jira/browse/IMPALA-8490 > Project: IMPALA > Issue Type: Sub-task > Components: Docs > Reporter: Sahil Takiar > Assignee: Alex Rodoni > Priority: Major > Labels: in_33 > Fix For: Impala 3.3.0 > > > https://impala.apache.org/docs/build/html/topics/impala_scalability.html > state: > {quote} > Because this feature only involves HDFS data files, it does not apply to > non-HDFS tables, such as Kudu or HBase tables, or tables that store their > data on cloud services such as S3 or ADLS. > {quote} > This section should be updated because the file handle cache now supports S3 > files. > We should add a section to the docs similar to what we added when support for > remote HDFS files was added to the file handle cache: > {quote} > In Impala 3.2 and higher, file handle caching also applies to remote HDFS > file handles. This is controlled by the cache_remote_file_handles flag for an > impalad. It is recommended that you use the default value of true as this > caching prevents your NameNode from overloading when your cluster has many > remote HDFS reads. > {quote} > Like {{cache_remote_file_handles}} the flag {{cache_s3_file_handles}} has > been added as an impalad startup option (the flag is enabled by default). > Unlike HDFS though, S3 has no NameNode, the benefit is that it eliminate a > call to {{getFileStatus()}} on the target S3 file. So "prevents your NameNode > from overloading when your cluster has many remote HDFS reads" should be > changed to something like "avoids an unnecessary call to > S3AFileSystem#getFileStatus() which reduces the number of API calls made to > S3." -- This message was sent by Atlassian Jira (v8.3.2#803003)