ASF GitHub Bot commented on DRILL-4653:

Github user paul-rogers commented on the issue:

    Looks like you are right; the JsonParser is more than a simple tokenizer.
    We're not the first to try this: 
 (no answer)
    I tried an experiment and found that you are on the right track: the way 
you are using the JsonParser can be extended to ignore input until the start of 
the next object. A quick demonstration:
        private static void recover(JsonParser parser) throws IOException {
          for ( ; ; ) {
            JsonToken token;
            try {
              token = parser.nextToken();
            } catch( JsonParseException e ) { continue; }
            if ( token == null ) return;
            if ( token != JsonToken.END_OBJECT ) { continue; }
            token = parser.nextToken();
            if ( token == null ) return;
            if ( token == JsonToken.START_OBJECT ) { return; }
    Basically, we keep reading tokens, and ignoring errors, until we 
successfully find the } { pair.
    As we discussed before, to use the above in Drill, we have to discard the 
partly-built record, and start reading the next record assiming the parser is 
positioned **after** the START_OBJECT ("{") token, which we've already 
consumed. That should be simple.
    Still, to do proper recovery, we have to discard the partly-built JSON 
record. I've not looked into how to do that. If we don't do that, we return the 
bogus partly-built record. Worse, if we recover by trying to build a new 
record, we create more partly-built records, but with a different schema, 
possibly triggering a schema change event when not really necessary.
    Any ideas for how to solve that problem?

> Malformed JSON should not stop the entire query from progressing
> ----------------------------------------------------------------
>                 Key: DRILL-4653
>                 URL: https://issues.apache.org/jira/browse/DRILL-4653
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - JSON
>    Affects Versions: 1.6.0
>            Reporter: subbu srinivasan
>             Fix For: Future
> Currently Drill query terminates upon first encounter of a invalid JSON line.
> Drill has to continue progressing after ignoring the bad records. Something 
> similar to a setting of (ignore.malformed.json) would help.

This message was sent by Atlassian JIRA

Reply via email to