[ 
https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15512183#comment-15512183
 ] 

ASF GitHub Bot commented on DRILL-4653:
---------------------------------------

Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/518
  
    Looks like you are right; the JsonParser is more than a simple tokenizer.
    
    We're not the first to try this: 
http://stackoverflow.com/questions/37511496/recover-from-malformed-json-with-jackson
 (no answer)
    
    I tried an experiment and found that you are on the right track: the way 
you are using the JsonParser can be extended to ignore input until the start of 
the next object. A quick demonstration:
    
        private static void recover(JsonParser parser) throws IOException {
          for ( ; ; ) {
            JsonToken token;
            try {
              token = parser.nextToken();
            } catch( JsonParseException e ) { continue; }
            if ( token == null ) return;
            if ( token != JsonToken.END_OBJECT ) { continue; }
            token = parser.nextToken();
            if ( token == null ) return;
            if ( token == JsonToken.START_OBJECT ) { return; }
          }
        }
    
    Basically, we keep reading tokens, and ignoring errors, until we 
successfully find the } { pair.
    
    As we discussed before, to use the above in Drill, we have to discard the 
partly-built record, and start reading the next record assiming the parser is 
positioned **after** the START_OBJECT ("{") token, which we've already 
consumed. That should be simple.
    
    Still, to do proper recovery, we have to discard the partly-built JSON 
record. I've not looked into how to do that. If we don't do that, we return the 
bogus partly-built record. Worse, if we recover by trying to build a new 
record, we create more partly-built records, but with a different schema, 
possibly triggering a schema change event when not really necessary.
    
    Any ideas for how to solve that problem?



> Malformed JSON should not stop the entire query from progressing
> ----------------------------------------------------------------
>
>                 Key: DRILL-4653
>                 URL: https://issues.apache.org/jira/browse/DRILL-4653
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - JSON
>    Affects Versions: 1.6.0
>            Reporter: subbu srinivasan
>             Fix For: Future
>
>
> Currently Drill query terminates upon first encounter of a invalid JSON line.
> Drill has to continue progressing after ignoring the bad records. Something 
> similar to a setting of (ignore.malformed.json) would help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to