Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/518
  
    Looks like you are right; the JsonParser is more than a simple tokenizer.
    
    We're not the first to try this: 
http://stackoverflow.com/questions/37511496/recover-from-malformed-json-with-jackson
 (no answer)
    
    I tried an experiment and found that you are on the right track: the way 
you are using the JsonParser can be extended to ignore input until the start of 
the next object. A quick demonstration:
    
        private static void recover(JsonParser parser) throws IOException {
          for ( ; ; ) {
            JsonToken token;
            try {
              token = parser.nextToken();
            } catch( JsonParseException e ) { continue; }
            if ( token == null ) return;
            if ( token != JsonToken.END_OBJECT ) { continue; }
            token = parser.nextToken();
            if ( token == null ) return;
            if ( token == JsonToken.START_OBJECT ) { return; }
          }
        }
    
    Basically, we keep reading tokens, and ignoring errors, until we 
successfully find the } { pair.
    
    As we discussed before, to use the above in Drill, we have to discard the 
partly-built record, and start reading the next record assiming the parser is 
positioned **after** the START_OBJECT ("{") token, which we've already 
consumed. That should be simple.
    
    Still, to do proper recovery, we have to discard the partly-built JSON 
record. I've not looked into how to do that. If we don't do that, we return the 
bogus partly-built record. Worse, if we recover by trying to build a new 
record, we create more partly-built records, but with a different schema, 
possibly triggering a schema change event when not really necessary.
    
    Any ideas for how to solve that problem?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to