Hari Sekhon created DRILL-3625:
----------------------------------

             Summary: Dynamic Format Detection in DFS backend for unmapped file 
extensions / files without extensions
                 Key: DRILL-3625
                 URL: https://issues.apache.org/jira/browse/DRILL-3625
             Project: Apache Drill
          Issue Type: New Feature
          Components: Storage - JSON, Storage - Other, Storage - Parquet, 
Storage - Text & CSV
    Affects Versions: 1.1.0
            Reporter: Hari Sekhon
            Assignee: Steven Phillips


When querying a json file that doesn't have a ".json" extension such as ".log" 
I get this exception:
{code}0: jdbc:drill:zk=local> select * from dfs.down.`auditOut.log` limit 1;
Aug 11, 2015 4:01:38 PM org.apache.calcite.sql.validate.SqlValidatorException 
<init>
SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 
'dfs.down.auditOut.log' not found
Aug 11, 2015 4:01:38 PM org.apache.calcite.runtime.CalciteException <init>
SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 
15 to line 1, column 17: Table 'dfs.down.auditOut.log' not found
Error: PARSE ERROR: From line 1, column 15 to line 1, column 17: Table 
'dfs.down.auditOut.log' not found

[Error Id: 5610210b-3eb2-497f-9443-c725b29733b6 on <host>:31010] (state=,code=0)
{code}
However when renaming the file to have a .json extension then the query 
succeeds.

Now while I could reconfigure the DFS plugin to associate all files with *.log 
extension to be mapped to json, this doesn't seem like the right thing to do. I 
could rename the file to have a .json extension of course which is the better 
thing to do but this highlights another question, why doesn't this just work 
as-is?

Hence I'd like to raise this as a feature request that when an unmapped 
extension or file without any extension is encountered Drill should do a few 
quick checks on the file type and then use the appropriate storage backend for 
the file.

Adding this "Dynamic Format Detection" as I have dubbed it would tie in nicely 
with Drill's style and existing features like the dynamic schema detection 
already used for json.

This may also come in handy for dealing with outputs from MapReduce jobs where 
the files may be named part-m-NNNNN or part-r-NNNNN without any extension and 
for example if those files were text then the text storage backend could be 
immediately invoked upon them in Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to