[
https://issues.apache.org/jira/browse/DRILL-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vova Vysotskyi updated DRILL-6096:
----------------------------------
Labels: doc-impacting ready-to-commit (was: doc-impacting)
> Provide mechanisms to specify field delimiters and quoted text for
> TextRecordWriter
> -----------------------------------------------------------------------------------
>
> Key: DRILL-6096
> URL: https://issues.apache.org/jira/browse/DRILL-6096
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Text & CSV
> Affects Versions: 1.12.0
> Reporter: Kunal Khatua
> Assignee: Arina Ielchiieva
> Priority: Major
> Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Currently, there is no way for a user to specify theĀ field delimiter for the
> writing records as a text output. Further more, if the fields contain the
> delimiter, we have no mechanism of specifying quotes.
> By default, quotes should be used to enclose non-numeric fields being written.
> *Description of the implemented changes:*
> 2 options are added to control text writer output:
> {{store.text.writer.add_header}} - indicates if header should be added in
> created text file. Default is true.
> {{store.text.writer.force_quotes}} - indicates if all value should be quoted.
> Default is false. It means only values that contain special characters (line
> / field separators) will be quoted.
> Line / field separators, quote / escape characters can be configured using
> text format configuration using Web UI. User can create special format only
> for writing data and then use it when creating files. Though such format can
> be always used to read back written data.
> {noformat}
> "formats": {
> "write_text": {
> "type": "text",
> "extensions": [
> "txt"
> ],
> "lineDelimiter": "\n",
> "fieldDelimiter": "!",
> "quote": "^",
> "escape": "^",
> }
> },
> ...
> {noformat}
> Next set specified format and create text file:
> {noformat}
> alter session set `store.format` = 'write_text';
> create table dfs.tmp.t as select 1 as id from (values(1));
> {noformat}
> Notes:
> 1. To write data univocity-parsers are used, they limit line separator length
> to not more than 2 characters, though Drill allows setting more 2 chars as
> line separator since Drill can read data splitting by line separator of any
> length, during data write exception will be thrown.
> 2. {{extractHeader}} in text format configuration does not affect if header
> will be written to text file, only {{store.text.writer.add_header}} controls
> this action. {{extractHeader}} is used only when reading the data.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)