clintropolis commented on code in PR #18056: URL: https://github.com/apache/druid/pull/18056#discussion_r2123027964
########## docs/querying/projections.md: ########## @@ -0,0 +1,181 @@ +--- +id: projections +title: Query projections +sidebar_label: Projections +description: . +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +Projections are a type of aggregation that is computed and stored as part of a segment. The pre-aggregated data can speed up queries by reducing the number of rows that need to be processed for any query shape that matches a projection. + +## Create a projection + +A projection has three components: + +- Virtual columns (`spec.projections.virtualColumns`) that are used to compute a projection. The source data for the virtual columns must exist in your datasource. +- Grouping columns (`spec.projections.groupingColumns`) that are used to group a projection. They must either already exist in your datasource or be defined in `virtualColumns`. The order in which you define your grouping columns equates to the order in which data is sorted in the projection, always ascending. +- Aggregators (`spec.projections.aggregators`) that define the columns you want to create projections for and which aggregator to use for that column. They must either already exist in your datasource or be defined in `virtualColumns`. + +The aggregators are what Druid attempts to match when you run a query. If an aggregator in a query matches an aggregator you defined in your projection, Druid uses it. + +You can either create a projection at ingestion time or after the datasource is created. + +Note that any projection dimension you create becomes part of your datasource. To remove a projection from your datasource, you need to reingest the data. Alternatively, you can use a query context parameter to not use projections for a specific query. + + + +### At ingestion time + +To create a projection at ingestion time, use the [`projectionsSpec` block in your ingestion spec](../ingestion/ingestion-spec.md#projections). + +### After ingestion + +:::info + +To create a projection for an existing datasource, you must have the `druid-catalog` extension loaded. Review Comment: fwiw, this isn't strictly true - while i think in the future we want to recommend the catalog as the way to do things, it is also possible to define the projection specs in an 'inline' compaction spec as well in a `projections` property (where inline is what we call the class, the non-catalog based compaction spec). It is also worth mentioning that the catalog compaction spec is not as fully featured as the inline compaction spec in terms of functionality, for example it can not change the schema of the base table like can be done with an inline spec, and some other things too, i forget off the top of my head. The catalog is required to build projections for MSQ inserts/replaces though, so that should probably be ########## docs/ingestion/ingestion-spec.md: ########## @@ -396,6 +396,46 @@ The `filter` conditionally filters input rows during ingestion. Only rows that p ingested. Any of Druid's standard [query filters](../querying/filters.md) can be used. Note that within a `transformSpec`, the `transforms` are applied before the `filter`, so the filter can refer to a transform. +### Projections + +Projections are pre-aggregated segments that can speed up queries by reducing the number of rows that need to be processed. Use the `projectionsSpec` block to define projections for your data during ingestion or [create them afterwards](../querying/projections.md#after-ingestion). + +Note that any projections you define becomes a dimension for your datasource. To remove a projection from your datasource, you need to reingest the data with the projection removed. Alternatively, you can use a query context parameter to not use projections for a specific query. Review Comment: wording seems off on this, let me think on it and get back to you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
