[ 
https://issues.apache.org/jira/browse/DRILL-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028678#comment-16028678
 ] 

Paul Rogers commented on DRILL-5553:
------------------------------------

The problem appears to be in the planner, not the CSV reader. The following is 
a snippet of the physical plan given to the CSV reader:

{code}
    "columns" : [ "`*`" ],
{code}

As Arina noted elsewhere, the planner "compresses" the "columns" column into * 
for the purposes of the scanner, but somehow expands it elsewhere. Since 
"columns" is special only to the CSV reader, but not to Drill, the Project 
operator (perhaps) does not know that "columns" is supposed to be a Varchar 
array.


> SELECT *, columns produces nonsense results
> -------------------------------------------
>
>                 Key: DRILL-5553
>                 URL: https://issues.apache.org/jira/browse/DRILL-5553
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> Consider the case discussed in DRILL-5551. Create a slight variation. 
> Input file: CSV with headers:
> {code}
> a,b,c
> 10,foo,bar
> {code}
> As in DRILL-5550, CSV plugin is configured to use headers.
> Run this (admittedly strange) query:
> {code}
> SELECT *, columns FROM `dfs.data.example.csv`
> {code}
> The resulting schema is:
> {code}
> BatchSchema [fields=[
> a(VARCHAR:REQUIRED) [$offsets$(UINT4:REQUIRED)], 
> b(VARCHAR:REQUIRED) [$offsets$(UINT4:REQUIRED)], 
> c(VARCHAR:REQUIRED) [$offsets$(UINT4:REQUIRED)], 
> columns(INT:OPTIONAL) [$bits$(UINT1:REQUIRED), columns(INT:OPTIONAL)]], 
> selectionVector=NONE]
> {code}
> To make it easier to read:
> {code}
> a(VARCHAR:REQUIRED), 
> b(VARCHAR:REQUIRED).
> c(VARCHAR:REQUIRED),
> columns(INT:OPTIONAL)
> {code}
> In DRILL-5551, {{columns}} changes meaning from an array of columns to a 
> blank normal column. Here, it changes meaning again to a nullable Int (our 
> normal "placeholder" for missing columns.)
> Expected:
> 1. That, per DRILL-5552, no other column reference can occur with "*".
> 2. If item 1 is not fixed, that the scanner (or text reader) forbid the use 
> of either "*" or "columns" with other column references.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to