[ 
https://issues.apache.org/jira/browse/DRILL-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753836#comment-16753836
 ] 

benj commented on DRILL-7001:
-----------------------------

I think the most problematic was the silent drop of columns (not evocated 
upper) like describe in DRILL-5762. But now options exists that allow to 
circumvent the problem in almost every case.

For the rest, apart from the missing documentation, just 2 little ideas :
 * Why not propose in CSV with header mode, an "extra" possibility to request 
the columns like the "without header" by an array access or alias column
{code:java}
SELECT name, age ,* FROM tmp.`my.csvh`; /* Already possible */
SELECT name, columns[1], * FROM tmp.`my.csvh`; /* Access with array like in no 
header way */
SELECT name, ALIAS_column_1, * FROM tmp.`my.csvh`; /* Access with real name of 
column or automatic alias generated in function of position */
{code}

 * Why not propose, as "extra", to access to real name with quoting (doesn't 
solve duplicate problem, but it's allow additional flexibility)
{code:java}
SELECT 123 AS `_@ItsPossibleWhenNamingColumns!`;
{code}

Maybe, several possibilities can be proposed at the same time to improve 
flexibility

> Documentation - renaming columns name in csv header
> ---------------------------------------------------
>
>                 Key: DRILL-7001
>                 URL: https://issues.apache.org/jira/browse/DRILL-7001
>             Project: Apache Drill
>          Issue Type: Wish
>    Affects Versions: 1.15.0
>            Reporter: benj
>            Priority: Minor
>
> Don't know how if this is the best place for this request but,
> Some operation are realized that eventually change the name of the column 
> when requesting a csvh file (with header),
>  These operations are not documented.
>  Although it's possible to read 
> [HeaderBuilder.java|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/HeaderBuilder.java],
>  It will be interesting to create a section in documentation to explain at 
> least the principle of these different cases to avoid stupid 
> problems/difficulties
> List of operations (maybe not exhaustive) :
>  * Trim() on CSV column name
> {noformat}
>  Name , Age,PoB  , Info
> =>
> `Name`, `Age`, `PoB` and `Info`{noformat}
>  * Others characters than [a-zA-Z0-9_] are replace by '_' (underscore)
> {noformat}
> Name,Sum$,em@il
> =>
> `Name`,'`Sum_`,`em_il`{noformat}
>  * Fieldname starting with '_' (underscore) are prefixed by 'col'
> {noformat}
> _name,_age_,pob_,_col_
> =>
> `col_name`, `col_age_`, `pob_`, `col_col_`{noformat}
>  * Fieldname starting with [^a-zA-Z] are prefixed 'col_'
> {noformat}
> 0_name, 1_age,@pob,#other1,'other2'
> =>
> `col_0_name`, `col_1_age`, `col_pob`, `col_other1`, `col_other2_`{noformat}
>  *  Quotation marks are removed
>  * If char is unique
>  ** if [a-zA-Z] do nothing
>  ** elif [0-9] prefix with col_
>  ** else reanme in column_[0-9]+ where [0-9]+ designs the position of the 
> column
>  * Duplicate columns names (case insensitive) are suffixed with _[0-9]+ 
> (starting from "_2")
> {noformat}
> 0_name,col_0_name,colx,COLX,colx,colx_2
> =>
> `col_0_name`, `col_0_name_2`, `colx`, `COLX_2`, `colx_3`, `colx_2_2`{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to