Abhishek Girish created DRILL-3230:
--------------------------------------
Summary: Local file system plug-in must be disabled in distributed
mode
Key: DRILL-3230
URL: https://issues.apache.org/jira/browse/DRILL-3230
Project: Apache Drill
Issue Type: Bug
Components: Client - HTTP
Reporter: Abhishek Girish
Assignee: Jacques Nadeau
The local file system plug-in (The "file:///" connection string in dfs storage
plug-in) does not behave as expected for both CTAS and querying files, when
Drill is configured with distributed mode (multiple drill-bits across nodes).
In case of CTAS, parquet files will be written to a specific node's local file
system, depending on which Drill-bit the client connects to. And if the table
is moderate to large in size, Drill may process them in a distributed manner
and write data into more than one node - data is partitioned into different
nodes.
In case of queries, it could be confusing again, as the behavior will depend on
which drill-bit the client connects to. Hence the behavior seen would be
inconsistent - queries would return only partial results, which depend on the
drillbit connected to.
My suggestion would be that the local file system plugin be disabled with
distributed mode. With multiple drill bits and a centralized plugin for local
file system, consistent behavior cannot be expected.
It should be either disabled when distributed mode is detected or we could add
support for multiple namespaces (using IP of nodes) with local file systems
(might still not fix all issues). Or may be there could be other ways to
resolve this, which I might be overlooking or not aware of.
There have been many issues seen on the user ML, where inconsistent behaviors
have been observed by users.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)