[
https://issues.apache.org/jira/browse/DRILL-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14877351#comment-14877351
]
Jacques Nadeau commented on DRILL-3808:
---------------------------------------
Why would that affect anything else? The quote is configurable at the format
plugin level. Note here:
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java#L134
Configuration options existing for field and line delimiter as well as for
quote, escape, comment and skipFirstLine
> Let TextReader have the option to treat double quote as a literal
> -----------------------------------------------------------------
>
> Key: DRILL-3808
> URL: https://issues.apache.org/jira/browse/DRILL-3808
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Text & CSV
> Reporter: Sean Hsuan-Yi Chu
> Assignee: Sean Hsuan-Yi Chu
> Priority: Critical
>
> According to references [1], [2]:
> In .csv, the double quote is a special character as it can optionally enclose
> a text field. But in .tsv, it is not a special character, and it can appear
> anywhere and when it does, it should treated as a literal. The tsv format
> specification also does not provide for the tab or CR/LF characters to show
> up anywhere in text fields. However, Drill treats tsv very the same like csv.
> For an example, given data:
> {code}
> "test"\t"test"
> {code}
> A query: select columns[0], columns[1] from `t.tsv`; Drill would give
> {code}
> test test
> {code}
> However, according to the reference[2], it is supposed to be
> {code}
> "test" "test"
> {code}
> Ideally, the Drill should follow the standard see[2].
> [1] CSV - https://tools.ietf.org/html/rfc4180
> [2] TSV -
> http://www.iana.org/assignments/media-types/text/tab-separated-values
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)