[ 
https://issues.apache.org/jira/browse/DRILL-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751221#comment-16751221
 ] 

benj commented on DRILL-5762:
-----------------------------

In continuation of this subject--

Some words are reserved keyword for drill. Fortunately, these keyword can be 
used in request with just including backquote around
{code:java}
SELECT `select` FROM tmp.`myfile.csvh`;
{code}
But lot of word (filename, fqn, filepath, suffix, dir0, dir1, dir2, ...) are 
worst than the reserved keyword in the sense that they can never be used in 
request even with backquote nor prefixing alias table. (Because there are 
invisible columns)
{code:java}
SELECT t.`filename` FROM tmp.`myfile.csvh` t;
{code}
This request work but the column filename is here fille only with the name of 
the file (myfile.csvh) and not the value of the field 'filename' of the csv.

But even more cunning, with a csv file with 3 columns (filename, length, date)
{code:java}
SELECT * FROM tmp.`myfile.csvh` t;
|  length   | date  |
+-----------+-------+
| ...
{code}
In this case the column (filename of the file) is +dropped silently+. So this 
point is maybe simply very dangerous.

There exists solutions to this problem :
 * It's possible to work (if you well-know the structure of the file) to use 
TABLE
{code:java}
SELECT columns[0] AS `filename` FROM TABLE(tmp.`myfile.csvh` (type => 'text', 
fieldDelimiter => ',', extractHeader => false, skipFirstLine => true))t;
{code}

 * Since 1.10.0 and DRILL-5762 there is options to change the value of certain 
labels (but not all)
 ** drill.exec.storage.implicit.filename.column.label
 ** drill.exec.storage.implicit.filepath.column.label
 ** drill.exec.storage.implicit.fqn.column.label
 ** drill.exec.storage.implicit.suffix.column.label

Why not change the default name to avoid (very probable) collision with names 
as common, by directly prefix/suffix (with a "drill_" for example) ? More it 
may "protect" a little the other name (dir0, dir1...)

 

> CSV having header column by name "suffix" fails to load
> -------------------------------------------------------
>
>                 Key: DRILL-5762
>                 URL: https://issues.apache.org/jira/browse/DRILL-5762
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 1.11.0
>            Reporter: Praveen Yadav
>            Priority: Major
>
> Trying select * query on the below csv file using apache drill 1.11.0.
> {code:none}
> id,email,first_name,last_name,middle_name,suffix,work_phone,mobile_phone,gender,picture,speciality,taxonomy_code,education_details,experience_details,keywords,doctor_npi,wait_time,created_tstamp,created_by,last_updated_tstamp,last_updated_by,is_deleted
> 1,[email protected],XXXXX,XXXX,,Dr,912225711234,,M,assets/images/doctorIcon.png,Primary
>  Care Physician,Primary Care Doctor,M.D,3 years,Primary Care 
> Doctor,12349765,10,2015-04-22 17:20:48.0,,2015-12-16 12:06:27.0,,N
> 2,[email protected],XXXX,XXXX,,Dr,913345311234,,M,assets/images/doctorIcon.png,Eye
>  Doctor,EYE Care Doctor,MD,5 years,,16456076,20,2015-04-30 
> 11:07:57.0,,2015-11-07 08:49:57.0,,N
> {code}
> I get this error :
> {noformat}
> Error: DATA_READ ERROR: Error processing input: , line=1, char=286. Content 
> parsed: [ ]
> Failure while reading file file:somepath/file.csv. Happened at or shortly 
> before byte position 286.
> Fragment 0:0
> [Error Id: 1fff3645-e788-4ec3-b678-bea86a39003c on praveens-mbp.lan:31010] 
> (state=,code=0)
> {noformat}
> Solution:
> Replacing column name "suffix" with any other text fixed the error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to