Hari Sekhon created DRILL-3625:
----------------------------------
Summary: Dynamic Format Detection in DFS backend for unmapped file
extensions / files without extensions
Key: DRILL-3625
URL: https://issues.apache.org/jira/browse/DRILL-3625
Project: Apache Drill
Issue Type: New Feature
Components: Storage - JSON, Storage - Other, Storage - Parquet,
Storage - Text & CSV
Affects Versions: 1.1.0
Reporter: Hari Sekhon
Assignee: Steven Phillips
When querying a json file that doesn't have a ".json" extension such as ".log"
I get this exception:
{code}0: jdbc:drill:zk=local> select * from dfs.down.`auditOut.log` limit 1;
Aug 11, 2015 4:01:38 PM org.apache.calcite.sql.validate.SqlValidatorException
<init>
SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table
'dfs.down.auditOut.log' not found
Aug 11, 2015 4:01:38 PM org.apache.calcite.runtime.CalciteException <init>
SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column
15 to line 1, column 17: Table 'dfs.down.auditOut.log' not found
Error: PARSE ERROR: From line 1, column 15 to line 1, column 17: Table
'dfs.down.auditOut.log' not found
[Error Id: 5610210b-3eb2-497f-9443-c725b29733b6 on <host>:31010] (state=,code=0)
{code}
However when renaming the file to have a .json extension then the query
succeeds.
Now while I could reconfigure the DFS plugin to associate all files with *.log
extension to be mapped to json, this doesn't seem like the right thing to do. I
could rename the file to have a .json extension of course which is the better
thing to do but this highlights another question, why doesn't this just work
as-is?
Hence I'd like to raise this as a feature request that when an unmapped
extension or file without any extension is encountered Drill should do a few
quick checks on the file type and then use the appropriate storage backend for
the file.
Adding this "Dynamic Format Detection" as I have dubbed it would tie in nicely
with Drill's style and existing features like the dynamic schema detection
already used for json.
This may also come in handy for dealing with outputs from MapReduce jobs where
the files may be named part-m-NNNNN or part-r-NNNNN without any extension and
for example if those files were text then the text storage backend could be
immediately invoked upon them in Drill.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)