[jira] [Updated] (DRILL-6609) Investigate Creation of FileSystem Configuration for Hive Parquet Files

Timothy Farkas (JIRA) Wed, 22 Aug 2018 16:58:23 -0700


     [ 
https://issues.apache.org/jira/browse/DRILL-6609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Timothy Farkas updated DRILL-6609:
----------------------------------
    Description: 
Currently when reading a parquet file in Hive we try to speed things up by 
doing a native parquet scan with HiveDrillNativeParquetRowGroupScan. When 
retrieving the FileSystem Configuration to use in 
HiveDrillNativeParquetRowGroupScan.getFsConf, use all the properties defined 
for the HiveStoragePlugin. This could cause a misconfiguration in the 
HiveStoragePlugin to influence the configuration of our FileSystem.

Currently it is unclear if this was desired behavior or not. If it is desired 
we need to document why it was done. If it is not desired we need to fix the 
issue.

This may be the root cause of the issue discovered by chun

To reproduce the issue: 1) two or more nodes cluster; 2) enable impersonation; 
3) set "fs.default.name": "file:///" in hive storage plugin; 4) restart 
drillbits; 5) as a regular user, on node A, drop the table/file; 6) ctas from a 
large enough hive table as source to recreate the table/file; 7) query the 
table from node A should work; 8) query from node B as same user should 
reproduce the issue.

  was:
Currently when reading a parquet file in Hive we try to speed things up by 
doing a native parquet scan with HiveDrillNativeParquetRowGroupScan. When 
retrieving the FileSystem Configuration to use in 
HiveDrillNativeParquetRowGroupScan.getFsConf, use all the properties defined 
for the HiveStoragePlugin. This could cause a misconfiguration in the 
HiveStoragePlugin to influence the configuration of our FileSystem.

Currently it is unclear if this was desired behavior or not. If it is desired 
we need to document why it was done. If it is not desired we need to fix the 
issue.


> Investigate Creation of FileSystem Configuration for Hive Parquet Files
> -----------------------------------------------------------------------
>
>                 Key: DRILL-6609
>                 URL: https://issues.apache.org/jira/browse/DRILL-6609
>             Project: Apache Drill
>          Issue Type: Task
>            Reporter: Timothy Farkas
>            Priority: Major
>
> Currently when reading a parquet file in Hive we try to speed things up by 
> doing a native parquet scan with HiveDrillNativeParquetRowGroupScan. When 
> retrieving the FileSystem Configuration to use in 
> HiveDrillNativeParquetRowGroupScan.getFsConf, use all the properties defined 
> for the HiveStoragePlugin. This could cause a misconfiguration in the 
> HiveStoragePlugin to influence the configuration of our FileSystem.
> Currently it is unclear if this was desired behavior or not. If it is desired 
> we need to document why it was done. If it is not desired we need to fix the 
> issue.
> This may be the root cause of the issue discovered by chun
> To reproduce the issue: 1) two or more nodes cluster; 2) enable 
> impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin; 
> 4) restart drillbits; 5) as a regular user, on node A, drop the table/file; 
> 6) ctas from a large enough hive table as source to recreate the table/file; 
> 7) query the table from node A should work; 8) query from node B as same user 
> should reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6609) Investigate Creation of FileSystem Configuration for Hive Parquet Files

Reply via email to