Repository: incubator-impala Updated Branches: refs/heads/master 2414d4c9c -> 73e41cea1
IMPALA-4611: Checking perms on S3 files is a very expensive no-op We call getPermissions() on partition directories to find out if Impala has access to those files. On S3, this currently is a no-op as the S3A connector does not try to set/get the permissions for S3 objects. So, it always returns the default set of permissions -> 777. However, it still makes a roundtrip to S3 causing a slow down in the Catalog. We can return the READ_WRITE permission immediately if we know we are accessing an S3 file, thereby avoiding the round trip to S3 for every partition. This will greatly speedup metadata operations for S3 tables and partitions, which is already known to be a big bottleneck. If and when the S3A connector is able to manage permissions in the future, we need to revisit this code. However, as permissions on S3 are unsupported by Impala right now, we might as well gain on perf. Change-Id: If9d1072c185a6162727019cdf1cb34d7f3f1c75c Reviewed-on: http://gerrit.cloudera.org:8080/5449 Reviewed-by: Sailesh Mukil <[email protected]> Tested-by: Impala Public Jenkins Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/ffbdeda9 Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/ffbdeda9 Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/ffbdeda9 Branch: refs/heads/master Commit: ffbdeda9469997dce6c6c3a80c78877486a3bdd9 Parents: 2414d4c Author: Sailesh Mukil <[email protected]> Authored: Fri Dec 9 15:12:25 2016 -0800 Committer: Impala Public Jenkins <[email protected]> Committed: Tue Dec 13 01:56:28 2016 +0000 ---------------------------------------------------------------------- fe/src/main/java/org/apache/impala/catalog/HdfsTable.java | 10 ++++++++++ 1 file changed, 10 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ffbdeda9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java ---------------------------------------------------------------------- diff --git a/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java b/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java index ae5e811..c2c569f 100644 --- a/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java +++ b/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java @@ -795,9 +795,19 @@ public class HdfsTable extends Table { * permissions Impala has on the given path. If the path does not exist, recurses up * the path until a existing parent directory is found, and inherit access permissions * from that. + * Always returns READ_WRITE for S3 files. */ private TAccessLevel getAvailableAccessLevel(FileSystem fs, Path location) throws IOException { + + // Avoid calling getPermissions() on file path for S3 files, as that makes a round + // trip to S3. Also, the S3A connector is currently unable to manage S3 permissions, + // so for now it is safe to assume that all files(objects) have READ_WRITE + // permissions, as that's what the S3A connector will always return too. + // TODO: Revisit if the S3A connector is updated to be able to manage S3 object + // permissions. (see HADOOP-13892) + if (FileSystemUtil.isS3AFileSystem(fs)) return TAccessLevel.READ_WRITE; + FsPermissionChecker permissionChecker = FsPermissionChecker.getInstance(); while (location != null) { if (fs.exists(location)) {
