sascha-coenen commented on issue #7583: [Proposal] segmentMetadata query returns full list of dimensions URL: https://github.com/apache/incubator-druid/issues/7583#issuecomment-488003883 This is an awesome feature request! This has been an ongoing issue on our side as well. How sparse a dimensions should not make a given query fail. If a dimension contains a value in 1000 cases, queries work and the dimension is considered part of the schema. Same thing if the dimension contains 1 single non-null record and therefore has cardinality 1. If this is permissible, then it must not be the case that if the cardinality drops from 1 to, which is a very small difference, that this minute difference is a deciding factor for whether a dimension makes it into the schema or not, whether an SQL query fails or succeeds. From my user's perspective I applaud this proposal and believe that that indeed this current behaviour aught to be changed. I'm afraid that people might be hesitant to make this change thought, reason being that changing the behaviour of the metadataSegment query might be seen as a breaking change that is not backwards compatible. However, one might introduce a boolean flag into the request schema of a metadataSegment query that indicates whether the automatic schema construction should omit fields of cardinality=0 or include them. By setting the default to omit, one stays backwards compatible but yet allows folks to have sparse dimensions. As native queries do NOT fail on dimensions with cardinality 0 and even work for non-existing dimensions because their semantics are also well defined (cardinality=0, min=null, max=null, count=0, etc), I would also propose to fix the situation that SQL queries fail on cardinality=0 dimensions and would even suggest that they also should work on non-existing fields. However, whether SQL should fail or succeed on non-existing fields is highly debated, so it would be okay to let an SQL statement fail IF a field does not exist. BUT the issue is that currently SQL statements are failing even on fields that DO exist and are even listed in the segment dimension lists available via the coordinator console as Vladimir pointed out, and yet the SQL query fail. At the very least, this behaviour is inconsistent and I would agree to resolve this inconsistency by accepting a dimension with 0 entries as an existing dimension with cardinality 0 and count 0.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
