[
https://issues.apache.org/jira/browse/DRILL-6100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333381#comment-16333381
]
Sorabh Hamirwasia commented on DRILL-6100:
------------------------------------------
It's seen during investigation that Parquet data is sometimes accessed using
Query User instead of View Owner and hence there is AccessControlException
thrown from FileSystem while trying to open the file. During planning time
Drill tries to create the Parquet metadata cache by reading the footer of data
file. ReadFooter api in Parquet library get's the filesystem instance from a
Cache based on the URI of the file path. When it looks into the cache it
creates a key which get's current user by UserGroupInformation.getCurrentUser()
and this call returns different user at different times (queryUser, viewOwner
or ProcessUser). To make sure this call always return view owner Drill should
call [readFooter
api|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L421]
in doAs block of ViewOwner UGI.
> Intermittent failure while reading Parquet file footer during planning phase
> ----------------------------------------------------------------------------
>
> Key: DRILL-6100
> URL: https://issues.apache.org/jira/browse/DRILL-6100
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Affects Versions: 1.10.0
> Reporter: Sorabh Hamirwasia
> Assignee: Sorabh Hamirwasia
> Priority: Major
> Fix For: 1.13.0
>
>
> When running queries from multiple users for a view which then refers to a
> parquet data file, intermittent failure is seen during planning phase. The
> failure happens when the Parquet data file which view owner has access to is
> read to create metadata cache. Query user doesn't have direct access to the
> Parquet data file but has read access to the view which in turn is accessing
> the actual data. The expectation is while accessing the Parquet Data the view
> owner should be used. But while running concurrent queries from several
> client sporadic failures are observed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)