[
https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277803#comment-15277803
]
Arina Ielchiieva commented on DRILL-3149:
-----------------------------------------
After the fix:
1. multibyte line delimiters will be available.
For example, with "\r\n", to treat it as delimiter we can update storage plugin
by adding: "lineDelimiter": "\r\n" or use select with options query:
{code} select * from table(dfs.`my_table`(type=>'text',
'lineDelimiter'=>'\r\n')){code}
Still *\n* is treated as standard delimiter, so if file has new lines split by
them will also occur even if lineDelimiter is overriden.
Example:
Data set:
{noformat}a|||b\nc|||d{noformat}
Select:
{code}select * from table(dfs.`my_table`(type=>'text',
lineDelimiter=>'|||')){code}
Result:
{noformat}
a
b
c
d
{noformat}
2. select with options with honor java character literals (ex: \r, \n, \t).
Queries with them will work correctly:
{code} select * from table(dfs.`my_table`(type=>'text',
'lineDelimiter'=>'\r\n', 'fieldDelimiter'=>'\t')){code}
> TextReader should support multibyte line delimiters
> ---------------------------------------------------
>
> Key: DRILL-3149
> URL: https://issues.apache.org/jira/browse/DRILL-3149
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Text & CSV
> Affects Versions: 1.0.0, 1.1.0
> Reporter: Jim Scott
> Assignee: Arina Ielchiieva
> Priority: Minor
> Fix For: Future
>
>
> lineDelimiter in the TextFormatConfig doesn't support \r\n for record
> delimiters.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)