Github user fmethot commented on a diff in the pull request: https://github.com/apache/drill/pull/593#discussion_r80169931 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextReader.java --- @@ -231,33 +231,34 @@ private void parseQuotedValue(byte prev) throws IOException { final TextInput input = this.input; final byte quote = this.quote; - ch = input.nextChar(); + try { + input.setMonitorForNewLine(false); + ch = input.nextChar(); - while (!(prev == quote && (ch == delimiter || ch == newLine || isWhite(ch)))) { - if (ch != quote) { - if (prev == quote) { // unescaped quote detected - if (parseUnescapedQuotes) { - output.append(quote); - output.append(ch); - parseQuotedValue(ch); - break; - } else { - throw new TextParsingException( - context, - "Unescaped quote character '" - + quote - + "' inside quoted value of CSV field. To allow unescaped quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser settings. Cannot parse CSV input."); + while (!(prev == quote && (ch == delimiter || ch == newLine || isWhite(ch)))) { + if (ch != quote) { + if (prev == quote) { // unescaped quote detected + if (parseUnescapedQuotes) { + output.append(quote); + output.append(ch); + parseQuotedValue(ch); + break; + } else { + throw new TextParsingException(context, "Unescaped quote character '" + quote + "' inside quoted value of CSV field. To allow unescaped quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser settings. Cannot parse CSV input."); + } } + output.append(ch); + prev = ch; + } else if (prev == quoteEscape) { + output.append(quote); + prev = NULL_BYTE; + } else { + prev = ch; } - output.append(ch); - prev = ch; - } else if (prev == quoteEscape) { - output.append(quote); - prev = NULL_BYTE; - } else { - prev = ch; + ch = input.nextChar(); } - ch = input.nextChar(); + } finally { --- End diff -- Because the finally block always runs, it is important to always set the flag back to false because the input.getChar() is called from everywhere. - We could remove the finally assuming that when an exception occurs the TextReader will just stop doing any parsing and exit. (input.getChar never gets called again) Please advice if that's the way we should do. - In a custom version of the CompliantTextRecordReader that we use internally, we are resilient to error within rows, once an exception occurs we are able to recover to a next line and keep parsing, the finally clause ensure the TextInput is in proper state after failure. - Only thing I am worried is the extra operations required to handle try finally in a method that gets called 100 000s of time per seconds per thread, I haven't tested it. Compiler must be doing a good job at optimizing these.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---