[ 
https://issues.apache.org/jira/browse/DRILL-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6096:
------------------------------------
    Description: 
Currently, there is no way for a user to specify the field delimiter for the 
writing records as a text output. Further more, if the fields contain the 
delimiter, we have no mechanism of specifying quotes.

By default, quotes should be used to enclose non-numeric fields being written.

Description of implemented changes:

2 options are added to control text writer output:
{{store.text.writer.add_header}} - indicates if header should be added in 
created text file. Default is true.
{{store.text.writer.force_quotes}} - indicates if all value should be quoted. 
Default is false. It means only values that contain special characters (line / 
field separators) will be quoted.

Line / field separators, quote / escape characters can be configured using text 
format configuration using Web UI. User can create special format only for 
writing data and then use it when creating files. Though such format can be 
always used to read back written data.

{noformat}
  "formats": {
    "write_text": {
      "type": "text",
      "extensions": [
        "txt"
      ],
      "lineDelimiter": "\n",
      "fieldDelimiter": "!",
      "quote": "^",
      "escape": "^",
    }
   },
...
{noformat}

Next set specified format and create text file:
{noformat}
alter session set `store.format` = 'write_text';
create table dfs.tmp.t as select 1 as id from (values(1));
{noformat}

Notes:
1. To write data Univocity writer is used, it limit line separator length to 
not more than 2 chars, though Drill allows setting more 2 chars as line 
separator, during data write, exception will be thrown.
2. {{extractHeader}} in text format configuration does not affect if header 
will be written to text file, only {{store.text.writer.add_header}} controls 
this action. {{extractHeader}} is used only when reading the data.


  was:
Currently, there is no way for a user to specify the field delimiter for the 
writing records as a text output. Further more, if the fields contain the 
delimiter, we have no mechanism of specifying quotes.

By default, quotes should be used to enclose non-numeric fields being written.

Description of implemented changes:

2 options are added to control text writer output:
{{store.text.writer.add_header}} - indicates if header should be added in 
created text file. Default is true.
{{store.text.writer.force_quotes}} - indicates if all value should be quoted. 
Default is false. It means only values that contain special characters (line / 
field separators) will be quoted.

Line / field separators, quote / escape characters can be configured using text 
format configuration using Web UI. User can create special format only for 
writing data and then use it when creating files.

{noformat}
  "formats": {
    "write_text": {
      "type": "text",
      "extensions": [
        "txt"
      ],

      "lineDelimiter": "|",
      "fieldDelimiter": "|",
      "quote": "|",
      "escape": "|",
      "lineDelimiter": "|",

   public String lineDelimiter = "\n";
    public char fieldDelimiter = '\n';
    public char quote = '"';
    public char escape = '"';
    public char comment = '#';
    public boolean skipFirstLine = false;
    public boolean extractHeader = false;
    }
   },
...
{noformat}




> Provide mechanisms to specify field delimiters and quoted text for 
> TextRecordWriter
> -----------------------------------------------------------------------------------
>
>                 Key: DRILL-6096
>                 URL: https://issues.apache.org/jira/browse/DRILL-6096
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Text & CSV
>    Affects Versions: 1.12.0
>            Reporter: Kunal Khatua
>            Assignee: Arina Ielchiieva
>            Priority: Major
>              Labels: doc-impacting
>             Fix For: 1.17.0
>
>
> Currently, there is no way for a user to specify the field delimiter for the 
> writing records as a text output. Further more, if the fields contain the 
> delimiter, we have no mechanism of specifying quotes.
> By default, quotes should be used to enclose non-numeric fields being written.
> Description of implemented changes:
> 2 options are added to control text writer output:
> {{store.text.writer.add_header}} - indicates if header should be added in 
> created text file. Default is true.
> {{store.text.writer.force_quotes}} - indicates if all value should be quoted. 
> Default is false. It means only values that contain special characters (line 
> / field separators) will be quoted.
> Line / field separators, quote / escape characters can be configured using 
> text format configuration using Web UI. User can create special format only 
> for writing data and then use it when creating files. Though such format can 
> be always used to read back written data.
> {noformat}
>   "formats": {
>     "write_text": {
>       "type": "text",
>       "extensions": [
>         "txt"
>       ],
>       "lineDelimiter": "\n",
>       "fieldDelimiter": "!",
>       "quote": "^",
>       "escape": "^",
>     }
>    },
> ...
> {noformat}
> Next set specified format and create text file:
> {noformat}
> alter session set `store.format` = 'write_text';
> create table dfs.tmp.t as select 1 as id from (values(1));
> {noformat}
> Notes:
> 1. To write data Univocity writer is used, it limit line separator length to 
> not more than 2 chars, though Drill allows setting more 2 chars as line 
> separator, during data write, exception will be thrown.
> 2. {{extractHeader}} in text format configuration does not affect if header 
> will be written to text file, only {{store.text.writer.add_header}} controls 
> this action. {{extractHeader}} is used only when reading the data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to