[ https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15512183#comment-15512183 ]
ASF GitHub Bot commented on DRILL-4653: --------------------------------------- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/518 Looks like you are right; the JsonParser is more than a simple tokenizer. We're not the first to try this: http://stackoverflow.com/questions/37511496/recover-from-malformed-json-with-jackson (no answer) I tried an experiment and found that you are on the right track: the way you are using the JsonParser can be extended to ignore input until the start of the next object. A quick demonstration: private static void recover(JsonParser parser) throws IOException { for ( ; ; ) { JsonToken token; try { token = parser.nextToken(); } catch( JsonParseException e ) { continue; } if ( token == null ) return; if ( token != JsonToken.END_OBJECT ) { continue; } token = parser.nextToken(); if ( token == null ) return; if ( token == JsonToken.START_OBJECT ) { return; } } } Basically, we keep reading tokens, and ignoring errors, until we successfully find the } { pair. As we discussed before, to use the above in Drill, we have to discard the partly-built record, and start reading the next record assiming the parser is positioned **after** the START_OBJECT ("{") token, which we've already consumed. That should be simple. Still, to do proper recovery, we have to discard the partly-built JSON record. I've not looked into how to do that. If we don't do that, we return the bogus partly-built record. Worse, if we recover by trying to build a new record, we create more partly-built records, but with a different schema, possibly triggering a schema change event when not really necessary. Any ideas for how to solve that problem? > Malformed JSON should not stop the entire query from progressing > ---------------------------------------------------------------- > > Key: DRILL-4653 > URL: https://issues.apache.org/jira/browse/DRILL-4653 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON > Affects Versions: 1.6.0 > Reporter: subbu srinivasan > Fix For: Future > > > Currently Drill query terminates upon first encounter of a invalid JSON line. > Drill has to continue progressing after ignoring the bad records. Something > similar to a setting of (ignore.malformed.json) would help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)