[GitHub] [drill] cgivre commented on pull request #2283: DRILL-7979: Self-Closing XML Tags Cause Schema Change Exceptions

GitBox Mon, 02 Aug 2021 15:31:28 -0700


cgivre commented on pull request #2283:
URL: https://github.com/apache/drill/pull/2283#issuecomment-891377214



   > Yes, I think I did grok the unknown schema problem. The thought above, 
which somehow escaped all the striking out I did to it after thinking a bit 
more, was to take advantage of the fact that scalar string can be embedded into 
a single element map. The tuple generating code would need to become aware when 
it should do this.
   > 
   > My second comment's comparison of the situation with a JSON property that 
is first null, then an object, is also a bit dubious because empty XML elements 
do not represent nulls (from I what read) so much as zero length strings.
   > 
   > If there is an effort to make querying XML behave in a more similar way to 
querying equivalent JSON, for some definition of equivalent, it should probably 
wait for another PR.
   
   I think you're right about that.  From what I remember, there is an option 
for Drill's JSON parser to treat `NaN` and something else as `null`.   For XML 
I don't know how you'd distinguish between an empty string and `null`.  
   
   This was also an issue with some data I was working on.  The JSON version 
used empty strings to denote `null` then subsequent rows would contain maps 
which would cause SchemaChange exceptions.  The only way to fix that was to use 
the `UNION` data type.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [drill] cgivre commented on pull request #2283: DRILL-7979: Self-Closing XML Tags Cause Schema Change Exceptions

Reply via email to