[ 
https://issues.apache.org/jira/browse/DRILL-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682046#comment-14682046
 ] 

Sudheesh Katkam commented on DRILL-3625:
----------------------------------------

You don't need to rename the file; did you try defining a [default input 
format| https://drill.apache.org/docs/drill-default-input-format/ ]?

> Dynamic Format Detection in DFS backend for unmapped file extensions / files 
> without extensions
> -----------------------------------------------------------------------------------------------
>
>                 Key: DRILL-3625
>                 URL: https://issues.apache.org/jira/browse/DRILL-3625
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Storage - JSON, Storage - Other, Storage - Parquet, 
> Storage - Text & CSV
>    Affects Versions: 1.1.0
>            Reporter: Hari Sekhon
>            Assignee: Steven Phillips
>
> When querying a json file that doesn't have a ".json" extension such as 
> ".log" I get this exception:
> {code}0: jdbc:drill:zk=local> select * from dfs.down.`auditOut.log` limit 1;
> Aug 11, 2015 4:01:38 PM org.apache.calcite.sql.validate.SqlValidatorException 
> <init>
> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 
> 'dfs.down.auditOut.log' not found
> Aug 11, 2015 4:01:38 PM org.apache.calcite.runtime.CalciteException <init>
> SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, 
> column 15 to line 1, column 17: Table 'dfs.down.auditOut.log' not found
> Error: PARSE ERROR: From line 1, column 15 to line 1, column 17: Table 
> 'dfs.down.auditOut.log' not found
> [Error Id: 5610210b-3eb2-497f-9443-c725b29733b6 on <host>:31010] 
> (state=,code=0)
> {code}
> However when renaming the file to have a .json extension then the query 
> succeeds.
> Now while I could reconfigure the DFS plugin to associate all files with 
> *.log extension to be mapped to json, this doesn't seem like the right thing 
> to do. I could rename the file to have a .json extension of course which is 
> the better thing to do but this highlights another question, why doesn't this 
> just work as-is?
> Hence I'd like to raise this as a feature request that when an unmapped 
> extension or file without any extension is encountered Drill should do a few 
> quick checks on the file type and then use the appropriate storage backend 
> for the file.
> Adding this "Dynamic Format Detection" as I have dubbed it would tie in 
> nicely with Drill's style and existing features like the dynamic schema 
> detection already used for json.
> This may also come in handy for dealing with outputs from MapReduce jobs 
> where the files may be named part-m-NNNNN or part-r-NNNNN without any 
> extension and for example if those files were text then the text storage 
> backend could be immediately invoked upon them in Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to