[ https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240724#comment-16240724 ]
ASF GitHub Bot commented on DRILL-4990: --------------------------------------- Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/652#discussion_r149173203 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -151,17 +152,32 @@ public WorkspaceSchemaFactory( */ public boolean accessible(final String userName) throws IOException { final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf); + boolean tryListStatus = false; try { - // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style - // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem - // has a limited private API (FileSystem.access) to check the permissions directly - // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of - // FileClient. TODO: Update this when DRILL-3749 is fixed. - fs.listStatus(wsPath); + // access API checks if a user has certain permissions on a file or directory. + // returns normally if requested permissions are granted and throws an exception + // if access is denied. This API was added in HDFS 2.6 (see HDFS-6570). + // It is less expensive (than listStatus which was being used before) and hides the + // complicated access control logic underneath. + fs.access(wsPath, FsAction.READ); } catch (final UnsupportedOperationException e) { - logger.trace("The filesystem for this workspace does not support this operation.", e); + logger.debug("The filesystem for this workspace does not support access operation.", e); + tryListStatus = true; } catch (final FileNotFoundException | AccessControlException e) { - return false; + logger.debug("file {} not found or cannot be accessed", wsPath.toString(), e); + tryListStatus = true; --- End diff -- It can be trusted fine for regular file systems HDFS, MapR FS, Mac OS Extended File System etc. I am not sure if it works for windows or not. We were never able to figure out whether it is a problem with the test setup or it is actually not supported. Updated the diffs with your review comments taken care of. Please take a look. > Use new HDFS API access instead of listStatus to check if users have > permissions to access workspace. > ----------------------------------------------------------------------------------------------------- > > Key: DRILL-4990 > URL: https://issues.apache.org/jira/browse/DRILL-4990 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Affects Versions: 1.8.0 > Reporter: Padma Penumarthy > Assignee: Padma Penumarthy > > For every query, we build the schema tree > (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all > storage plugins are checked and are added to the schema tree if they are > accessible by the user who initiated the query. For file system plugin, > listStatus API is used to check if the workspace is accessible or not > (WorkspaceSchemaFactory.accessible) by the user. The idea seem to be if the > user does not have access to file(s) in the workspace, listStatus will > generate an exception and we return false. But, listStatus (which lists all > the entries of a directory) is an expensive operation when there are large > number of files in the directory. A new API is added in Hadoop 2.6 called > access (HDFS-6570) which provides the ability to check if the user has > permissions on a file/directory. Use this new API instead of listStatus. -- This message was sent by Atlassian JIRA (v6.4.14#64029)