[ 
https://issues.apache.org/jira/browse/DRILL-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17279153#comment-17279153
 ] 

Martin Kusnir commented on DRILL-7821:
--------------------------------------

I believe this particular issue can be fixed by surrounding the MapWriter call 
in handleString() at org.apache.drill.exec.vector.complex.fn.JsonReader.java 
with a try-catch block to avoid throwing an exception if the input text is 
either null or empty. This results in Drill interpreting an empty string as an 
empty object (if the object type has already been inferred from an earlier key) 
so "" is treated as {}.

However, this not solve the reverse case where the first instance of a key is 
an empty string, and later the same key appears with an object value. In the 
provided file servicenow.json, this occurs at 324:19, the value for "company" 
at that location is an object however earlier in the file it appeared as "". 
I'm not sure how this case could be addressed without significant changes to 
the JSON parser, due to the way Drill performs a lookup for the writer object 
which is based on the type inferred in the first occurrence of the key.

I noticed that setting exec.enable.union.type=true allows the provided 
servicenow.json file to be parsed, however the feature is listed as 
experimental in the docs. Will this become the "proper" way to handle this 
issue in a future release, or should I test/pursue this change further?

> Treat Empty String as NULL in JSON Reader
> -----------------------------------------
>
>                 Key: DRILL-7821
>                 URL: https://issues.apache.org/jira/browse/DRILL-7821
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>    Affects Versions: 1.17.0
>            Reporter: Charles Givre
>            Priority: Major
>             Fix For: 1.19.0
>
>         Attachments: servicenow.json
>
>
> In the JSON below the field `resolved_by` first appears as a map but in 
> records when it is not defined, is empty strings.  This causes 
> SchemaChangeExceptions. 
> I'm wondering if Drill could either ignore or interpret the empty strings as 
> null so that the query will complete.
>  
> {{{
>  ....
>    "skills": "",
>        "number": "INC0000001",
>        *"resolved_by": {*
>          *"link": 
> "https://empmgill4.service-now.com/api/now/table/sys_user/6816f79cc0a8016401c5a33be04be441",*
>          *"value": "6816f79cc0a8016401c5a33be04be441"*
>        *},*
>        "sys_updated_by": "admin"
>  ...
>  },
> {       ...       "number": "INC0000002",       *"resolved_by": "",*       
> "sys_updated_by": "admin", ... }}
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to