[ 
https://issues.apache.org/jira/browse/DRILL-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681693#comment-16681693
 ] 

Mariano Ruiz commented on DRILL-6840:
-------------------------------------

Thanks [~arina],

Now knowing better the difference between Drill and SqlLine I understand you 
are right.

Anyway maybe we should create another ticket, because thinking a bit more about 
how the exporter works when you use a sentence like (not the recorder) :
{code:java}
CREATE TABLE dfs.tmp. ...
{code}
It's not the lack of a feature the CSV exporter has, it's a bug itself, because 
it leads to wrong parsed CSV files. Let me explain:

CSV file doesn't enforce you to enclose the string with the " character, in 
fact as much I know there is no clear standard, but it's a fact that if you 
have a cell with a text that has a character that it's the same character to 
separate the columns, enclose the text with " is needed. I detected the issue 
because that, I had a column that has values like *_Smartwatch XYZ, Black_* 
(note the comm in the text), that is followed by other columns, so because 
Drill don't enclose this cell with the " character, any CSV interpreter like 
any Office tool or a Java library interpret the value in the cell as two cells 
instead of one.

So I can understand that Drill don't have and maybe wont have a setting to 
configure whether you want always to enclose cells, but anyway it should 
enclose any cell that it has the comma character (or the separator used) in its 
value, always in this case without the need of any configuration.

What do you think? Do you know whether this was considered before?

> Exporting to CSV using !set csvquotecharacter '"' not working in latest 
> stable or snapshot versions
> ---------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-6840
>                 URL: https://issues.apache.org/jira/browse/DRILL-6840
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 1.14.0
>         Environment: * Tested with latest version *Apache Drill* 1.14.0, and 
> building the latest version from master (Github repo), commit 
> ad61c6bc1dd24994e50fe7dfed043d5e57dba8f9 at _Nov 5, 2018_.
>  * *Linux* x64, Ubuntu 16.04
>  * *OpenJDK* Runtime Environment (build 
> 1.8.0_171-8u171-b11-0ubuntu0.17.10.1-b11)
>  * Apache *Maven* 3.5.0
>            Reporter: Mariano Ruiz
>            Priority: Minor
>              Labels: csv, csvparser, export
>
> Using latest stable version and latest SNAPSHOT version, when I export to a 
> CSV file the result of a query, the text fields aren't enclosed with double 
> quotes as specified.
> Steps:
> {code:java}
> 0: jdbc:drill:zk=local> USE dfs.tmp;
> +-------+--------------------------------------+
> |  ok   |               summary                |
> +-------+--------------------------------------+
> | true  | Default schema changed to [dfs.tmp]  |
> +-------+--------------------------------------+
> 1 row selected (0.126 seconds)
> 0: jdbc:drill:zk=local> ALTER SESSION SET `store.format`='csv';
> +-------+------------------------+
> |  ok   |        summary         |
> +-------+------------------------+
> | true  | store.format updated.  |
> +-------+------------------------+
> 1 row selected (0.117 seconds)
> 0: jdbc:drill:zk=local> !set csvquotecharacter '"'
> 0: jdbc:drill:zk=local> CREATE TABLE dfs.tmp.prods_without_brand AS SELECT * 
> FROM dfs.`/tmp/prods.csv` WHERE brand = '';
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 0_0       | 112                        |
> +-----------+----------------------------+
> 1 row selected (0.198 seconds)
> 0: jdbc:drill:zk=local> 
> {code}
> The CSV output doesn't have any field enclosed with *{color:red}"{color}*, 
> even those that have values with the *{color:red},{color}* character, so the 
> CSV is broken.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to