[
https://issues.apache.org/jira/browse/DRILL-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495729#comment-16495729
]
ASF GitHub Bot commented on DRILL-5365:
---------------------------------------
ilooner commented on a change in pull request #796: DRILL-5365: DrillFileSystem
setConf in constructor. DrillFileSystem c…
URL: https://github.com/apache/drill/pull/796#discussion_r191933248
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java
##########
@@ -74,6 +74,7 @@ public FileSystemPlugin(FileSystemConfig config,
DrillbitContext context, String
fsConf.set(s, config.config.get(s));
}
}
+ fsConf.set("fs.default.name", config.connection);
Review comment:
Okay I think I have a better handle on this now. The original issue was that
Drill's hive storage plugin had a configuration option of fs.default.name =
file:// . Somehow when a hive table was dropped and then recreated with a ctas
statement in drill, the CTAS statement picked up the fs.default.name
configuration from the hive storage plugin and passed that on to
DrillFileSystem. And apparently if both fs.default.name and fd.defaultFS are
present with different values the value for fs.default.name wins even though it
is deprecated. So the CTAS statement would end up creating the table on a drill
node's local filesystem.
I believe the crux of this PR is to force "fs.default.name" to have to
correct value in the event that a different value is defined in the HiveStorage
plugin.
With that said, there are several questions.
1. How the heck does a property in the HiveStoragePlugin make it's way into
the FileSystem configuration? I spent a good amount of time looking at the code
and for the life of me I can't figure that out.
2. The follow up to (1) is do we actually want that behavior? We can force
fs.default.name to have the right value but what about other properties we
might suck in from a HiveStoragePlugin configuration?
3. If we don't want this behavior what would be the real fix?
In the face of all this ambiguity I think we should move forward with a
minimal PR that forces fs.default.name to be correct now. We can have a follow
up Jira that actually fixes the underlying problem of sucking in stray configs
down the road if someone complains about it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> FileNotFoundException when reading a parquet file
> -------------------------------------------------
>
> Key: DRILL-5365
> URL: https://issues.apache.org/jira/browse/DRILL-5365
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Hive
> Affects Versions: 1.10.0
> Reporter: Chun Chang
> Assignee: Timothy Farkas
> Priority: Major
> Fix For: 1.14.0
>
>
> The parquet file is generated through the following CTAS.
> To reproduce the issue: 1) two or more nodes cluster; 2) enable
> impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin;
> 4) restart drillbits; 5) as a regular user, on node A, drop the table/file;
> 6) ctas from a large enough hive table as source to recreate the table/file;
> 7) query the table from node A should work; 8) query from node B as same user
> should reproduce the issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)