paul-rogers commented on PR #13168: URL: https://github.com/apache/druid/pull/13168#issuecomment-1275431852
On the mixed-type issue: > You can use CAST function to convert to a uniform type. Indeed. This is the solution that the Catalog project proposes. Rather than do the cast in each and every query, tell the catalog the preferred type and the cast will be done automagically. So, for the purposes of this PR, we can require that all column values are of a single type. Check if the type varies, and if so, throw an exception. The user works around the issue with the cast (or, later, with a catalog entry.) If Druid has some magic, that can be added as a follow-up. One way to check that the schemas are identical is to check if the schema of the next batch to arrive at the merger is the same as the current batch. And, when we emit values from the merge, we use that same schema. Of course, we actually only care about the types of the sort key columns, so we an be a bit more lenient and only compare the key column types. A side note about the "type conflict magic". In other projects, we failed to find the magic. If you have `VARCHAR` and `BIGINT` which type wins? What if you saw 10K of the `VARCHAR` rows before the first `BIGINT` row? Or visa-versa? Given this, the solution that requires the user to break the tie is reasonable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
