[
https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15512183#comment-15512183
]
ASF GitHub Bot commented on DRILL-4653:
---------------------------------------
Github user paul-rogers commented on the issue:
https://github.com/apache/drill/pull/518
Looks like you are right; the JsonParser is more than a simple tokenizer.
We're not the first to try this:
http://stackoverflow.com/questions/37511496/recover-from-malformed-json-with-jackson
(no answer)
I tried an experiment and found that you are on the right track: the way
you are using the JsonParser can be extended to ignore input until the start of
the next object. A quick demonstration:
private static void recover(JsonParser parser) throws IOException {
for ( ; ; ) {
JsonToken token;
try {
token = parser.nextToken();
} catch( JsonParseException e ) { continue; }
if ( token == null ) return;
if ( token != JsonToken.END_OBJECT ) { continue; }
token = parser.nextToken();
if ( token == null ) return;
if ( token == JsonToken.START_OBJECT ) { return; }
}
}
Basically, we keep reading tokens, and ignoring errors, until we
successfully find the } { pair.
As we discussed before, to use the above in Drill, we have to discard the
partly-built record, and start reading the next record assiming the parser is
positioned **after** the START_OBJECT ("{") token, which we've already
consumed. That should be simple.
Still, to do proper recovery, we have to discard the partly-built JSON
record. I've not looked into how to do that. If we don't do that, we return the
bogus partly-built record. Worse, if we recover by trying to build a new
record, we create more partly-built records, but with a different schema,
possibly triggering a schema change event when not really necessary.
Any ideas for how to solve that problem?
> Malformed JSON should not stop the entire query from progressing
> ----------------------------------------------------------------
>
> Key: DRILL-4653
> URL: https://issues.apache.org/jira/browse/DRILL-4653
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - JSON
> Affects Versions: 1.6.0
> Reporter: subbu srinivasan
> Fix For: Future
>
>
> Currently Drill query terminates upon first encounter of a invalid JSON line.
> Drill has to continue progressing after ignoring the bad records. Something
> similar to a setting of (ignore.malformed.json) would help.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)