benj created DRILL-7001: --------------------------- Summary: Documentation - renaming columns name in csv header Key: DRILL-7001 URL: https://issues.apache.org/jira/browse/DRILL-7001 Project: Apache Drill Issue Type: Wish Affects Versions: 1.15.0 Reporter: benj
Don't know how if this is the best place for this request but, Some operation are realized that eventually change the name of the column when requesting a csvh file (with header), These operations are not documented. Although it's possible to read [HeaderBuilder.java|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/HeaderBuilder.java], It will be interesting to create a section in documentation to explain at least the principle of these different cases to avoid stupid problems/difficulties List of operations (maybe not exhaustive) : * Trim() on CSV column name {noformat} Name , Age,PoB , Info => `Name`, `Age`, `PoB` and `Info`{noformat} * Others characters than [a-zA-Z0-9_] are replace by '_' (underscore) {noformat} Name,Sum$,em@il => `Name`,'`Sum_`,`em_il`{noformat} * Fieldname starting with '_' (underscore) are prefixed by 'col' {noformat} _name,_age_,pob_,_col_ => `col_name`, `col_age_`, `pob_`, `col_col_`{noformat} * Fieldname starting with [^a-zA-Z] are prefixed 'col_' {noformat} 0_name, 1_age,@pob,#other1,'other2' => `col_0_name`, `col_1_age`, `col_pob`, `col_other1`, `col_other2_`{noformat} * Quotation marks are removed * If char is unique ** if [a-zA-Z] do nothing ** elif [0-9] prefix with col_ ** else reanme in column_[0-9]+ where [0-9]+ designs the position of the column * Duplicate columns names (case insensitive) are suffixed with _[0-9]+ (starting from "_2") {noformat} 0_name,col_0_name,colx,COLX,colx,colx_2 => `col_0_name`, `col_0_name_2`, `colx`, `COLX_2`, `colx_3`, `colx_2_2`{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)