[ 
https://issues.apache.org/jira/browse/DRILL-5550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16163919#comment-16163919
 ] 

ASF GitHub Bot commented on DRILL-5550:
---------------------------------------

Github user prasadns14 commented on the issue:

    https://github.com/apache/drill/pull/939
  
    @paul-rogers Can you please review the PR?


> SELECT non-existent column produces empty required VARCHAR
> ----------------------------------------------------------
>
>                 Key: DRILL-5550
>                 URL: https://issues.apache.org/jira/browse/DRILL-5550
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> Drill's CSV column reader supports two forms of files:
> * Files with column headers as the first line of the file.
> * Files without column headers.
> The CSV storage plugin specifies which format to use for files accessed via 
> that storage plugin config.
> Suppose we have a CSV file with headers:
> {code}
> a,b,c
> 10,foo,bar
> {code}
> Suppose we configure a storage plugin to use headers:
> {code}
>     TextFormatConfig csvFormat = new TextFormatConfig();
>     csvFormat.fieldDelimiter = ',';
>     csvFormat.skipFirstLine = false;
>     csvFormat.extractHeader = true;
> {code}
> (The above can also be done using JSON when running Drill as a server.)
> Execute the following query:
> {code}
> SELECT a, c, d FROM `dfs.data.example.csv`
> {code}
> Results:
> {code}
> a,c,d
> 10,bar,
> {code}
> The actual type of column {{d}} is non-nullable VARCHAR.
> This is inconsistent with other parts of Drill in two ways, one may be a bug. 
> Most other parts of Drill use a nullable INT for "missing" columns.
> 1. For CSV it makes sense for the data type to be VARCHAR, since all CSV 
> columns are of that type.
> 2. It may *not* make sense for the column to be non-nullable and blank rather 
> than nullable and NULL. In SQL, NULL means that the data is unknown, which is 
> the case here.
> In the future, we may want to use some other indication for a missing column. 
> Until then, the requested change is to make the type of a missing CSV column 
> a nullable VARCHAR set to value NULL.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to