[
https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505198#comment-15505198
]
ASF GitHub Bot commented on DRILL-4653:
---------------------------------------
Github user ssriniva123 commented on the issue:
https://github.com/apache/drill/pull/518
Apologize for getting back on this thread late, got tied up with some
issues@work.
Paul,
The json parser is not just a tokenizer, it keeps track of the JSON
structure and understands various aspects of it like root, array/objectcontext
and all parsing is done under that context.
- we cannot keep track of {} accurately - For eg: The counting json
processor does a parser. skipChildren which tries to skip to the end of the
JSON, but this can rollover to next line when
there is a malformed JSON in the bottom most json sub object - see example
below (missing " in last json structure). This is similar behavior with the
JsonReader.
{"balance": 1000.0,"num": 100,"is_vip": true,"name":
"foo3","curr":{"denom":"pound","test":{"value :false}}}
- One possible solution is to rewind the input source to reset the stream
(which is not recommended and there is no guarentee that all streams support
mark/reset semantics.
Given where we are, I think the solution proposed works perfect for almost
all malformed JSON's.
> Malformed JSON should not stop the entire query from progressing
> ----------------------------------------------------------------
>
> Key: DRILL-4653
> URL: https://issues.apache.org/jira/browse/DRILL-4653
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - JSON
> Affects Versions: 1.6.0
> Reporter: subbu srinivasan
> Fix For: Future
>
>
> Currently Drill query terminates upon first encounter of a invalid JSON line.
> Drill has to continue progressing after ignoring the bad records. Something
> similar to a setting of (ignore.malformed.json) would help.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)