paul-rogers commented on PR #13168:
URL: https://github.com/apache/druid/pull/13168#issuecomment-1275431852

   On the mixed-type issue:
   
   > You can use CAST function to convert to a uniform type.
   
   Indeed. This is the solution that the Catalog project proposes. Rather than 
do the cast in each and every query, tell the catalog the preferred type and 
the cast will be done automagically.
   
   So, for the purposes of this PR, we can require that all column values are 
of a single type. Check if the type varies, and if so, throw an exception. The 
user works around the issue with the cast (or, later, with a catalog entry.) If 
Druid has some magic, that can be added as a follow-up.
   
   One way to check that the schemas are identical is to check if the schema of 
the next batch to arrive at the merger is the same as the current batch. And, 
when we emit values from the merge, we use that same schema. Of course, we 
actually only care about the types of the sort key columns, so we an be a bit 
more lenient and only compare the key column types.
   
   A side note about the "type conflict magic". In other projects, we failed to 
find the magic. If you have `VARCHAR` and `BIGINT` which type wins? What if you 
saw 10K of the `VARCHAR` rows before the first `BIGINT` row? Or visa-versa? 
Given this, the solution that requires the user to break the tie is reasonable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to