[
https://issues.apache.org/jira/browse/DRILL-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553520#comment-15553520
]
ASF GitHub Bot commented on DRILL-3178:
---------------------------------------
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/593#discussion_r82304834
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextReader.java
---
@@ -231,33 +231,34 @@ private void parseQuotedValue(byte prev) throws
IOException {
final TextInput input = this.input;
final byte quote = this.quote;
- ch = input.nextChar();
+ try {
+ input.setMonitorForNewLine(false);
+ ch = input.nextChar();
- while (!(prev == quote && (ch == delimiter || ch == newLine ||
isWhite(ch)))) {
- if (ch != quote) {
- if (prev == quote) { // unescaped quote detected
- if (parseUnescapedQuotes) {
- output.append(quote);
- output.append(ch);
- parseQuotedValue(ch);
- break;
- } else {
- throw new TextParsingException(
- context,
- "Unescaped quote character '"
- + quote
- + "' inside quoted value of CSV field. To allow
unescaped quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser
settings. Cannot parse CSV input.");
+ while (!(prev == quote && (ch == delimiter || ch == newLine ||
isWhite(ch)))) {
+ if (ch != quote) {
+ if (prev == quote) { // unescaped quote detected
+ if (parseUnescapedQuotes) {
+ output.append(quote);
+ output.append(ch);
+ parseQuotedValue(ch);
+ break;
+ } else {
+ throw new TextParsingException(context, "Unescaped quote
character '" + quote + "' inside quoted value of CSV field. To allow unescaped
quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser settings. Cannot
parse CSV input.");
+ }
}
+ output.append(ch);
+ prev = ch;
+ } else if (prev == quoteEscape) {
+ output.append(quote);
+ prev = NULL_BYTE;
+ } else {
+ prev = ch;
}
- output.append(ch);
- prev = ch;
- } else if (prev == quoteEscape) {
- output.append(quote);
- prev = NULL_BYTE;
- } else {
- prev = ch;
+ ch = input.nextChar();
}
- ch = input.nextChar();
+ } finally {
--- End diff --
I see why it is done in finally. However, as noted above, I'm not sure that
pushing this kind of flag into the getChar function is the optimal approach...
> csv reader should allow newlines inside quotes
> -----------------------------------------------
>
> Key: DRILL-3178
> URL: https://issues.apache.org/jira/browse/DRILL-3178
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Text & CSV
> Affects Versions: 1.0.0
> Environment: Ubuntu Trusty 14.04.2 LTS
> Reporter: Neal McBurnett
> Assignee: F Méthot
> Fix For: Future
>
> Attachments: drill-3178.patch
>
>
> When reading a csv file which contains newlines within quoted strings, e.g.
> via
> select * from dfs.`/tmp/q.csv`;
> Drill 1.0 says:
> Error: SYSTEM ERROR: com.univocity.parsers.common.TextParsingException:
> Error processing input: Cannot use newline character within quoted string
> But many tools produce csv files with newlines in quoted strings. Drill
> should be able to handle them.
> Workaround: the csvquote program (https://github.com/dbro/csvquote) can
> encode embedded commas and newlines, and even decode them later if desired.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)