[ https://issues.apache.org/jira/browse/DRILL-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553520#comment-15553520 ]
ASF GitHub Bot commented on DRILL-3178: --------------------------------------- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/593#discussion_r82304834 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextReader.java --- @@ -231,33 +231,34 @@ private void parseQuotedValue(byte prev) throws IOException { final TextInput input = this.input; final byte quote = this.quote; - ch = input.nextChar(); + try { + input.setMonitorForNewLine(false); + ch = input.nextChar(); - while (!(prev == quote && (ch == delimiter || ch == newLine || isWhite(ch)))) { - if (ch != quote) { - if (prev == quote) { // unescaped quote detected - if (parseUnescapedQuotes) { - output.append(quote); - output.append(ch); - parseQuotedValue(ch); - break; - } else { - throw new TextParsingException( - context, - "Unescaped quote character '" - + quote - + "' inside quoted value of CSV field. To allow unescaped quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser settings. Cannot parse CSV input."); + while (!(prev == quote && (ch == delimiter || ch == newLine || isWhite(ch)))) { + if (ch != quote) { + if (prev == quote) { // unescaped quote detected + if (parseUnescapedQuotes) { + output.append(quote); + output.append(ch); + parseQuotedValue(ch); + break; + } else { + throw new TextParsingException(context, "Unescaped quote character '" + quote + "' inside quoted value of CSV field. To allow unescaped quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser settings. Cannot parse CSV input."); + } } + output.append(ch); + prev = ch; + } else if (prev == quoteEscape) { + output.append(quote); + prev = NULL_BYTE; + } else { + prev = ch; } - output.append(ch); - prev = ch; - } else if (prev == quoteEscape) { - output.append(quote); - prev = NULL_BYTE; - } else { - prev = ch; + ch = input.nextChar(); } - ch = input.nextChar(); + } finally { --- End diff -- I see why it is done in finally. However, as noted above, I'm not sure that pushing this kind of flag into the getChar function is the optimal approach... > csv reader should allow newlines inside quotes > ----------------------------------------------- > > Key: DRILL-3178 > URL: https://issues.apache.org/jira/browse/DRILL-3178 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text & CSV > Affects Versions: 1.0.0 > Environment: Ubuntu Trusty 14.04.2 LTS > Reporter: Neal McBurnett > Assignee: F Méthot > Fix For: Future > > Attachments: drill-3178.patch > > > When reading a csv file which contains newlines within quoted strings, e.g. > via > select * from dfs.`/tmp/q.csv`; > Drill 1.0 says: > Error: SYSTEM ERROR: com.univocity.parsers.common.TextParsingException: > Error processing input: Cannot use newline character within quoted string > But many tools produce csv files with newlines in quoted strings. Drill > should be able to handle them. > Workaround: the csvquote program (https://github.com/dbro/csvquote) can > encode embedded commas and newlines, and even decode them later if desired. -- This message was sent by Atlassian JIRA (v6.3.4#6332)