sascha-coenen commented on issue #7583: [Proposal] segmentMetadata query 
returns full list of dimensions
URL: 
https://github.com/apache/incubator-druid/issues/7583#issuecomment-488003883
 
 
   This is an awesome feature request!
   
   This has been an ongoing issue on our side as well.
   How sparse a dimensions should not make a given query fail.
   
   If a dimension contains a value in 1000 cases, queries work and the 
dimension is considered part of the schema. Same thing if the dimension 
contains 1 single non-null record and therefore has cardinality 1.
   If this is permissible, then it must not be the case that if the cardinality 
drops from 1 to, which is a very small difference, that this minute difference 
is a deciding factor for whether a dimension makes it into the schema or not, 
whether an SQL query fails or succeeds.
   
   From my user's perspective I applaud this proposal and believe that that 
indeed this current behaviour aught to be changed.
   
   I'm afraid that people might be hesitant to make this change thought, reason 
being that changing the behaviour of the metadataSegment query might be seen as 
a breaking change that is not backwards compatible.
   However, one might introduce a boolean flag into the request schema of a 
metadataSegment query that indicates whether the automatic schema construction 
should omit fields of cardinality=0 or include them. By setting the default to 
omit, one stays backwards compatible but yet allows folks to have sparse 
dimensions.
   
   As native queries do NOT fail on dimensions with cardinality 0 and even work 
for non-existing dimensions because their semantics are also well defined 
(cardinality=0, min=null, max=null, count=0, etc), I would also propose to fix 
the situation that SQL queries fail on cardinality=0 dimensions and would even 
suggest that they also should work on non-existing fields.
   However, whether SQL should fail or succeed on non-existing fields is highly 
debated, so it would be okay to let an SQL statement fail IF a field does not 
exist.
   BUT the issue is that currently SQL statements are failing even on fields 
that DO exist and are even listed in the segment dimension lists available via 
the coordinator console as Vladimir pointed out, and yet the SQL query fail.
   At the very least, this behaviour is inconsistent and I would agree to 
resolve this inconsistency by accepting a dimension with 0 entries as an 
existing dimension with cardinality 0 and count 0.
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to