[
https://issues.apache.org/jira/browse/DRILL-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17279153#comment-17279153
]
Martin Kusnir commented on DRILL-7821:
--------------------------------------
I believe this particular issue can be fixed by surrounding the MapWriter call
in handleString() at org.apache.drill.exec.vector.complex.fn.JsonReader.java
with a try-catch block to avoid throwing an exception if the input text is
either null or empty. This results in Drill interpreting an empty string as an
empty object (if the object type has already been inferred from an earlier key)
so "" is treated as {}.
However, this not solve the reverse case where the first instance of a key is
an empty string, and later the same key appears with an object value. In the
provided file servicenow.json, this occurs at 324:19, the value for "company"
at that location is an object however earlier in the file it appeared as "".
I'm not sure how this case could be addressed without significant changes to
the JSON parser, due to the way Drill performs a lookup for the writer object
which is based on the type inferred in the first occurrence of the key.
I noticed that setting exec.enable.union.type=true allows the provided
servicenow.json file to be parsed, however the feature is listed as
experimental in the docs. Will this become the "proper" way to handle this
issue in a future release, or should I test/pursue this change further?
> Treat Empty String as NULL in JSON Reader
> -----------------------------------------
>
> Key: DRILL-7821
> URL: https://issues.apache.org/jira/browse/DRILL-7821
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - JSON
> Affects Versions: 1.17.0
> Reporter: Charles Givre
> Priority: Major
> Fix For: 1.19.0
>
> Attachments: servicenow.json
>
>
> In the JSON below the field `resolved_by` first appears as a map but in
> records when it is not defined, is empty strings. This causes
> SchemaChangeExceptions.
> I'm wondering if Drill could either ignore or interpret the empty strings as
> null so that the query will complete.
>
> {{{
> ....
> "skills": "",
> "number": "INC0000001",
> *"resolved_by": {*
> *"link":
> "https://empmgill4.service-now.com/api/now/table/sys_user/6816f79cc0a8016401c5a33be04be441",*
> *"value": "6816f79cc0a8016401c5a33be04be441"*
> *},*
> "sys_updated_by": "admin"
> ...
> },
> { ... "number": "INC0000002", *"resolved_by": "",*
> "sys_updated_by": "admin", ... }}
> }
--
This message was sent by Atlassian Jira
(v8.3.4#803005)