gianm commented on issue #7583: [Proposal] segmentMetadata query returns full list of dimensions URL: https://github.com/apache/incubator-druid/issues/7583#issuecomment-488131882 > Situation got worse after introducing SQL to Druid. For now Druid fails when these dimensions are requested in SQL expression. Example of the error response: Fwiw, Druid SQL checks segment metadata for _all_ segments in a datasource (not just the time range being queried), so this 'column not found' error should only happen if a column has no values at any point, in any segment. > Change segmentMetadata query to output all dimensions even they do not have data in the requested time range. This wouldn't help with the Druid SQL issue, due to the reason mentioned above. The heart of the issue is the difference between the list of dimensions that one has configured at ingestion time, and the list of columns that are actually physically stored within the Druid segments. Druid won't store columns that don't have any data, as an optimization to avoid storing a column full of nulls. So I think there are two ways to address this: 1. Change things such that Druid _does_ store columns that don't have any data, ideally in a special lightweight way so they don't take up much space (beyond some metadata that the column exists and is empty). 2. Change things such that Druid stores the list of configured dimensions in the segment `Metadata` object. (It already stores aggregators and queryGranularity in here, which is how those features of the segmentMetadata work.) Then, the segmentMetadata queries could return the list of configured dimensions, which would potentially be more comprehensive than the list of columns that actually got stored. Also, change Druid SQL to consider these dimensions-that-don't-have-physical-columns as valid SQL columns. I'd probably recommend (2) since this dimensions list might be useful for other purposes too, and it doesn't require changes to the segment storage format, so it doesn't have any backwards-compatibility concerns.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
