[
https://issues.apache.org/jira/browse/DRILL-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497412#comment-16497412
]
ASF GitHub Bot commented on DRILL-5365:
---------------------------------------
paul-rogers commented on issue #1296: DRILL-5365: Prevent plugin config from
changing default fs. Make DrillFileSystem Immutable.
URL: https://github.com/apache/drill/pull/1296#issuecomment-393726044
@ilooner, the fix will probably work, but seems a bit of a hack. It is not
clear why Hive changes the default file system, though we can speculate. Drill
likely uses Hive as a metastore. (Later versions of Hive split out the
metastore into Hive Meta Store or HMS, so, ideally, that's what Drill would
call it...)
t is unfortunate that Drill requires users to copy their Hive config from a
file into a Drill JSON config object. Two copies, in two formats, is typically
frowned upon.
So, we have HMS, describing data on disk. Presumably HMS wants to state the
file system that contains the file, and does so as part of its config.
Now, we want readers to read those files. One would expect that the "hive"
storage plugin replace Drill's format plugin mechanisms with its own
file-to-format mapping. When reading HMS files, Drill would use the "hive"
options and formats. Is that how it works? Not sure.
I believe that "hive" has its own set of readers. I've seen indications that
one cannot, say, use the Drill native CSV or Parquet readers (say) for "hive"
files. The question is, how is all this wired together. (I don't know, haven't
looked at this code.)
Supposedly, if we query a "hive" table, we need to use the "hive" file
system info.
Where does this "hive" info leak into a non-"hive" reader? Perhaps joining a
"hive" file with a "dfs" file?
With this fix, would that join work?
In short, I think we need to understand the above to ensure we don't
actually play Whack-a-Mole and introduce a new bug by fixing another.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> FileNotFoundException when reading a parquet file
> -------------------------------------------------
>
> Key: DRILL-5365
> URL: https://issues.apache.org/jira/browse/DRILL-5365
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Hive
> Affects Versions: 1.10.0
> Reporter: Chun Chang
> Assignee: Timothy Farkas
> Priority: Major
> Fix For: 1.14.0
>
>
> The parquet file is generated through the following CTAS.
> To reproduce the issue: 1) two or more nodes cluster; 2) enable
> impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin;
> 4) restart drillbits; 5) as a regular user, on node A, drop the table/file;
> 6) ctas from a large enough hive table as source to recreate the table/file;
> 7) query the table from node A should work; 8) query from node B as same user
> should reproduce the issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)