gianm commented on issue #7583: [Proposal] segmentMetadata query returns full 
list of dimensions
URL: 
https://github.com/apache/incubator-druid/issues/7583#issuecomment-488131882
 
 
   > Situation got worse after introducing SQL to Druid. For now Druid fails 
when these dimensions are requested in SQL expression. Example of the error 
response:
   
   Fwiw, Druid SQL checks segment metadata for _all_ segments in a datasource 
(not just the time range being queried), so this 'column not found' error 
should only happen if a column has no values at any point, in any segment.
   
   > Change segmentMetadata query to output all dimensions even they do not 
have data in the requested time range.
   
   This wouldn't help with the Druid SQL issue, due to the reason mentioned 
above.
   
   The heart of the issue is the difference between the list of dimensions that 
one has configured at ingestion time, and the list of columns that are actually 
physically stored within the Druid segments. Druid won't store columns that 
don't have any data, as an optimization to avoid storing a column full of 
nulls. So I think there are two ways to address this:
   
   1. Change things such that Druid _does_ store columns that don't have any 
data, ideally in a special lightweight way so they don't take up much space 
(beyond some metadata that the column exists and is empty).
   2. Change things such that Druid stores the list of configured dimensions in 
the segment `Metadata` object. (It already stores aggregators and 
queryGranularity in here, which is how those features of the segmentMetadata 
work.) Then, the segmentMetadata queries could return the list of configured 
dimensions, which would potentially be more comprehensive than the list of 
columns that actually got stored. Also, change Druid SQL to consider these 
dimensions-that-don't-have-physical-columns as valid SQL columns.
   
   I'd probably recommend (2) since this dimensions list might be useful for 
other purposes too, and it doesn't require changes to the segment storage 
format, so it doesn't have any backwards-compatibility concerns.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to