[
https://issues.apache.org/jira/browse/DRILL-6609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Timothy Farkas updated DRILL-6609:
----------------------------------
Description:
Currently when reading a parquet file in Hive we try to speed things up by
doing a native parquet scan with HiveDrillNativeParquetRowGroupScan. When
retrieving the FileSystem Configuration to use in
HiveDrillNativeParquetRowGroupScan.getFsConf, use all the properties defined
for the HiveStoragePlugin. This could cause a misconfiguration in the
HiveStoragePlugin to influence the configuration of our FileSystem.
Currently it is unclear if this was desired behavior or not. If it is desired
we need to document why it was done. If it is not desired we need to fix the
issue.
This may be the root cause of the issue discovered by chun
To reproduce the issue: 1) two or more nodes cluster; 2) enable impersonation;
3) set "fs.default.name": "file:///" in hive storage plugin; 4) restart
drillbits; 5) as a regular user, on node A, drop the table/file; 6) ctas from a
large enough hive table as source to recreate the table/file; 7) query the
table from node A should work; 8) query from node B as same user should
reproduce the issue.
was:
Currently when reading a parquet file in Hive we try to speed things up by
doing a native parquet scan with HiveDrillNativeParquetRowGroupScan. When
retrieving the FileSystem Configuration to use in
HiveDrillNativeParquetRowGroupScan.getFsConf, use all the properties defined
for the HiveStoragePlugin. This could cause a misconfiguration in the
HiveStoragePlugin to influence the configuration of our FileSystem.
Currently it is unclear if this was desired behavior or not. If it is desired
we need to document why it was done. If it is not desired we need to fix the
issue.
> Investigate Creation of FileSystem Configuration for Hive Parquet Files
> -----------------------------------------------------------------------
>
> Key: DRILL-6609
> URL: https://issues.apache.org/jira/browse/DRILL-6609
> Project: Apache Drill
> Issue Type: Task
> Reporter: Timothy Farkas
> Priority: Major
>
> Currently when reading a parquet file in Hive we try to speed things up by
> doing a native parquet scan with HiveDrillNativeParquetRowGroupScan. When
> retrieving the FileSystem Configuration to use in
> HiveDrillNativeParquetRowGroupScan.getFsConf, use all the properties defined
> for the HiveStoragePlugin. This could cause a misconfiguration in the
> HiveStoragePlugin to influence the configuration of our FileSystem.
> Currently it is unclear if this was desired behavior or not. If it is desired
> we need to document why it was done. If it is not desired we need to fix the
> issue.
> This may be the root cause of the issue discovered by chun
> To reproduce the issue: 1) two or more nodes cluster; 2) enable
> impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin;
> 4) restart drillbits; 5) as a regular user, on node A, drop the table/file;
> 6) ctas from a large enough hive table as source to recreate the table/file;
> 7) query the table from node A should work; 8) query from node B as same user
> should reproduce the issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)