Github user fmethot commented on a diff in the pull request:

    https://github.com/apache/drill/pull/593#discussion_r80169931
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextReader.java
 ---
    @@ -231,33 +231,34 @@ private void parseQuotedValue(byte prev) throws 
IOException {
         final TextInput input = this.input;
         final byte quote = this.quote;
     
    -    ch = input.nextChar();
    +    try {
    +      input.setMonitorForNewLine(false);
    +      ch = input.nextChar();
     
    -    while (!(prev == quote && (ch == delimiter || ch == newLine || 
isWhite(ch)))) {
    -      if (ch != quote) {
    -        if (prev == quote) { // unescaped quote detected
    -          if (parseUnescapedQuotes) {
    -            output.append(quote);
    -            output.append(ch);
    -            parseQuotedValue(ch);
    -            break;
    -          } else {
    -            throw new TextParsingException(
    -                context,
    -                "Unescaped quote character '"
    -                    + quote
    -                    + "' inside quoted value of CSV field. To allow 
unescaped quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser 
settings. Cannot parse CSV input.");
    +      while (!(prev == quote && (ch == delimiter || ch == newLine || 
isWhite(ch)))) {
    +        if (ch != quote) {
    +          if (prev == quote) { // unescaped quote detected
    +            if (parseUnescapedQuotes) {
    +              output.append(quote);
    +              output.append(ch);
    +              parseQuotedValue(ch);
    +              break;
    +            } else {
    +              throw new TextParsingException(context, "Unescaped quote 
character '" + quote + "' inside quoted value of CSV field. To allow unescaped 
quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser settings. Cannot 
parse CSV input.");
    +            }
               }
    +          output.append(ch);
    +          prev = ch;
    +        } else if (prev == quoteEscape) {
    +          output.append(quote);
    +          prev = NULL_BYTE;
    +        } else {
    +          prev = ch;
             }
    -        output.append(ch);
    -        prev = ch;
    -      } else if (prev == quoteEscape) {
    -        output.append(quote);
    -        prev = NULL_BYTE;
    -      } else {
    -        prev = ch;
    +        ch = input.nextChar();
           }
    -      ch = input.nextChar();
    +    } finally {
    --- End diff --
    
    Because the finally block always runs, it is important to always set the 
flag back to false because the input.getChar() is called  from everywhere. 
     - We could remove the finally assuming that when an exception occurs the 
TextReader will just stop doing any parsing and exit. (input.getChar never gets 
called again) Please advice if that's the way we should do.
    - In a custom version of the CompliantTextRecordReader that we use 
internally, we are resilient to error within rows, once an exception occurs we 
are able to recover to a next line and keep parsing, the finally clause ensure 
the TextInput is in proper state after failure.
    - Only thing I am worried is the extra operations required to handle try 
finally in a method that gets called 100 000s of time per seconds per thread, I 
haven't tested it. Compiler must be doing a good job at optimizing these.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to