[ 
https://issues.apache.org/jira/browse/DRILL-7979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391816#comment-17391816
 ] 

ASF GitHub Bot commented on DRILL-7979:
---------------------------------------

cgivre commented on pull request #2283:
URL: https://github.com/apache/drill/pull/2283#issuecomment-891377214


   > Yes, I think I did grok the unknown schema problem. The thought above, 
which somehow escaped all the striking out I did to it after thinking a bit 
more, was to take advantage of the fact that scalar string can be embedded into 
a single element map. The tuple generating code would need to become aware when 
it should do this.
   > 
   > My second comment's comparison of the situation with a JSON property that 
is first null, then an object, is also a bit dubious because empty XML elements 
do not represent nulls (from I what read) so much as zero length strings.
   > 
   > If there is an effort to make querying XML behave in a more similar way to 
querying equivalent JSON, for some definition of equivalent, it should probably 
wait for another PR.
   
   I think you're right about that.  From what I remember, there is an option 
for Drill's JSON parser to treat `NaN` and something else as `null`.   For XML 
I don't know how you'd distinguish between an empty string and `null`.  
   
   This was also an issue with some data I was working on.  The JSON version 
used empty strings to denote `null` then subsequent rows would contain maps 
which would cause SchemaChange exceptions.  The only way to fix that was to use 
the `UNION` data type.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Self-Closing XML Tags Cause Schema Change Exceptions
> ----------------------------------------------------
>
>                 Key: DRILL-7979
>                 URL: https://issues.apache.org/jira/browse/DRILL-7979
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Other
>    Affects Versions: 1.19.0
>            Reporter: Charles Givre
>            Assignee: Charles Givre
>            Priority: Major
>             Fix For: 1.20.0
>
>
> Self closing XML tags are dealt with strangely by java's streaming parser.  
> If you have data where you have one row containing a self closing XML tag foo 
> (<foo/>) but then in the next row `foo` contains a map or other nested field, 
> Drill will throw a schema change exception.  
> This proposed fix causes Drill to ignore self-closing tags unless they have 
> attributes, which allows data like this to be successfully queried.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to