[
https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15523997#comment-15523997
]
ASF GitHub Bot commented on DRILL-4653:
---------------------------------------
Github user paul-rogers commented on the issue:
https://github.com/apache/drill/pull/518
As it turns out, the sample code shown was actually tested with a stock
Jackson JSON parser: it does work. No parser changes are needed.
The issue is not whether we can make the parser do what is needed: the code
posted in the comment above demonstrated a solution.
The issue is how we incorporate that code into the JSON parser to clean up
partial records and prevent schema changes. When I have time, I'll investigate
that question in greater depth.
IMHO, without a proper fix, we should simply state that Drill does not
support malformed JSON. If an input file might be incorrect, run it though a
clean-up step before allowing Drill to scan it. Otherwise, we are opening the
door to many hard-to-resolve bugs when people ask Drill to scan corrupt JSON:
the result, without a proper fix, would be undefined -- which is worse than the
current behavior that simply fails the scan with an error.
Let's follow up again after I (or someone) has had a chance to figure out
if we can undo a partially built record. If we can do that, then we've got a
path to a clean solution: recover the parser (as shown earlier) and discard the
in-flight record (as we need to research.)
> Malformed JSON should not stop the entire query from progressing
> ----------------------------------------------------------------
>
> Key: DRILL-4653
> URL: https://issues.apache.org/jira/browse/DRILL-4653
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - JSON
> Affects Versions: 1.6.0
> Reporter: subbu srinivasan
> Fix For: Future
>
>
> Currently Drill query terminates upon first encounter of a invalid JSON line.
> Drill has to continue progressing after ignoring the bad records. Something
> similar to a setting of (ignore.malformed.json) would help.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)