[
https://issues.apache.org/jira/browse/DRILL-6100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333381#comment-16333381
]
Sorabh Hamirwasia edited comment on DRILL-6100 at 1/21/18 8:18 PM:
-------------------------------------------------------------------
It's seen during investigation that Parquet data is sometimes accessed using
Query User instead of ProcessUser and hence there is AccessControlException
thrown from FileSystem while trying to open the file. During planning time
Drill tries to create the Parquet metadata cache by reading the footer of data
file. ReadFooter api in Parquet library get's the filesystem instance from a
Cache based on the URI of the file path. When it looks into the cache it
creates a key which get's current user by UserGroupInformation.getCurrentUser()
and this call returns different user at different times (queryUser, viewOwner
or ProcessUser). To make sure this call always happen in context of process
user Drill should call [readFooter
api|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L421]
in doAs block of process user UGI during planning time at least. Here using
process user is fine since metadata file is created as process user as well and
is used by Drill across the queries. A separate JIRA needs to be created to
evaluate all the SCAN operator's as well, just to double confirm if data is
read by correct file system instance or not.
was (Author: shamirwasia):
It's seen during investigation that Parquet data is sometimes accessed using
Query User instead of View Owner and hence there is AccessControlException
thrown from FileSystem while trying to open the file. During planning time
Drill tries to create the Parquet metadata cache by reading the footer of data
file. ReadFooter api in Parquet library get's the filesystem instance from a
Cache based on the URI of the file path. When it looks into the cache it
creates a key which get's current user by UserGroupInformation.getCurrentUser()
and this call returns different user at different times (queryUser, viewOwner
or ProcessUser). To make sure this call always return view owner Drill should
call [readFooter
api|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L421]
in doAs block of ViewOwner UGI.
> Intermittent failure while reading Parquet file footer during planning phase
> ----------------------------------------------------------------------------
>
> Key: DRILL-6100
> URL: https://issues.apache.org/jira/browse/DRILL-6100
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Affects Versions: 1.10.0
> Reporter: Sorabh Hamirwasia
> Assignee: Sorabh Hamirwasia
> Priority: Major
> Fix For: 1.13.0
>
>
> When running queries from multiple users for a view which then refers to a
> parquet data file, intermittent failure is seen during planning phase. The
> failure happens when the Parquet data file which view owner has access to is
> read to create metadata cache. Query user doesn't have direct access to the
> Parquet data file but has read access to the view which in turn is accessing
> the actual data. When Parquet Metadata file is created it's created as
> ProcessUser based on DRILL-4143 but footer is not read under the process user
> context. While running concurrent queries from several client sporadic
> failures was observed since at times footer was being read as Query User
> which doesn't have access to the file.
>
> {code:java}
> 2018-01-12 13:19:57,267
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Failure
> creating scan.
> at
> org.apache.drill.exec.planner.logical.DrillScanRel.<init>(DrillScanRel.java:92)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at
> org.apache.drill.exec.planner.logical.DrillScanRel.<init>(DrillScanRel.java:70)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at
> org.apache.drill.exec.planner.logical.DrillScanRel.<init>(DrillScanRel.java:63)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at
> org.apache.drill.exec.planner.logical.DrillScanRule.onMatch(DrillScanRule.java:37)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
> ~[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
> ... 15 common frames omitted
> Caused by: org.apache.hadoop.security.AccessControlException: Open failed
> for file: /env/test/data/final
> at com.mapr.fs.MapRClientImpl.open(MapRClientImpl.java:265)
> ~[maprfs-5.2.2-mapr.jar:5.2.2-mapr]
> at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:938)
> ~[maprfs-5.2.2-mapr.jar:5.2.2-mapr]
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:803)
> ~[hadoop-common-2.7.0-mapr-1607.jar:na]
> at
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:425)
> ~[parquet-hadoop-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> at
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:412)
> ~[parquet-hadoop-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> at
> org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:395)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at org.apache.drill.exec.store.parquet.Metadata.access$000(Metadata.java:85)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at
> org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner(Metadata.java:323)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at
> org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner(Metadata.java:311)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:56)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:122)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at
> org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:285)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at
> org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:264)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at
> org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:249)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at
> org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:121)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at
> org.apache.drill.exec.store.parquet.ParquetGroupScan.init(ParquetGroupScan.java:733)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at
> org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:230)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at
> org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:190)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at
> org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:169)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at
> org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:67)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at
> org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:146)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at
> org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan(AbstractStoragePlugin.java:100)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at
> org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:85)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> at
> org.apache.drill.exec.planner.logical.DrillScanRel.<init>(DrillScanRel.java:90)
> ~[drill-java-exec-1.10.0.jar:1.10.0]
> ... 19 common frames omitted
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
> AccessControlException: Open failed for file:
> /env/test/data/final/snapshot_period_id=1234567/000000_0, error: Permission
> denied (13)
> {code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)