[ https://issues.apache.org/jira/browse/DRILL-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955586#comment-16955586 ]
ASF GitHub Bot commented on DRILL-6096: --------------------------------------- paul-rogers commented on issue #1873: DRILL-6096: Provide mechanism to configure text writer configuration URL: https://github.com/apache/drill/pull/1873#issuecomment-544277734 General question: the idea of giving the user more control is a good one. I'm seeing this need in multiple of recent plugins. I wonder, however, if we should offer a more scalable solution? We currently have plugin, session, config and table function options all which control some aspect of reading and writing. The plugin and config options really want to be "bulk" or "slow moving" settings: set them once for a system or file format. Session options have two problems. First, they introduce client state: they must be set and reset around each query that needs them. This prevents load balancing and fail-over. (By contrast, Impala keeps options in the client and sends them to the server on each query, allowing unlimited load balancing via round-robin and allows transparent fail over.) Further, session options are awkward for many tools: in addition to running a query, the client must send `ALTER SESSION` statements which are somehow associated with that query. Easy for our unit tests, not so easy for most BI tools. Table options look like what we want, but they have a number of issues, including that, when used, they "erase" any options set in the format plugin. Is there some other solution we could consider? Better table options? Some syntax that allows setting of options in the query? Some kind of view or schema that encapsulates these options so that they don't have to be provided on every query? I don't have a solid suggestion; I'm simply pointing out the ease-of-use issue. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Provide mechanisms to specify field delimiters and quoted text for > TextRecordWriter > ----------------------------------------------------------------------------------- > > Key: DRILL-6096 > URL: https://issues.apache.org/jira/browse/DRILL-6096 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text & CSV > Affects Versions: 1.12.0 > Reporter: Kunal Khatua > Assignee: Arina Ielchiieva > Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.17.0 > > > Currently, there is no way for a user to specify theĀ field delimiter for the > writing records as a text output. Further more, if the fields contain the > delimiter, we have no mechanism of specifying quotes. > By default, quotes should be used to enclose non-numeric fields being written. > *Description of the implemented changes:* > 2 options are added to control text writer output: > {{store.text.writer.add_header}} - indicates if header should be added in > created text file. Default is true. > {{store.text.writer.force_quotes}} - indicates if all value should be quoted. > Default is false. It means only values that contain special characters (line > / field separators) will be quoted. > Line / field separators, quote / escape characters can be configured using > text format configuration using Web UI. User can create special format only > for writing data and then use it when creating files. Though such format can > be always used to read back written data. > {noformat} > "formats": { > "write_text": { > "type": "text", > "extensions": [ > "txt" > ], > "lineDelimiter": "\n", > "fieldDelimiter": "!", > "quote": "^", > "escape": "^", > } > }, > ... > {noformat} > Next set specified format and create text file: > {noformat} > alter session set `store.format` = 'write_text'; > create table dfs.tmp.t as select 1 as id from (values(1)); > {noformat} > Notes: > 1. To write data univocity-parsers are used, they limit line separator length > to not more than 2 characters, though Drill allows setting more 2 chars as > line separator since Drill can read data splitting by line separator of any > length, during data write exception will be thrown. > 2. {{extractHeader}} in text format configuration does not affect if header > will be written to text file, only {{store.text.writer.add_header}} controls > this action. {{extractHeader}} is used only when reading the data. -- This message was sent by Atlassian Jira (v8.3.4#803005)