techdocsmith commented on a change in pull request #11883:
URL: https://github.com/apache/druid/pull/11883#discussion_r748661697
##########
File path: docs/querying/aggregations.md
##########
@@ -60,15 +60,15 @@ computes the sum of values as a 64-bit, signed integer
#### `doubleSum` aggregator
-Computes and stores the sum of values as 64-bit floating point value. Similar
to `longSum`
+Computes and stores the sum of values as 64-bit floating point value. Similar
to `longSum`.
Review comment:
```suggestion
Computes and stores the sum of values as a 64-bit floating point value.
Similar to `longSum`.
```
##########
File path: docs/querying/aggregations.md
##########
@@ -60,15 +60,15 @@ computes the sum of values as a 64-bit, signed integer
#### `doubleSum` aggregator
-Computes and stores the sum of values as 64-bit floating point value. Similar
to `longSum`
+Computes and stores the sum of values as 64-bit floating point value. Similar
to `longSum`.
```json
{ "type" : "doubleSum", "name" : <output_name>, "fieldName" : <metric_name> }
```
#### `floatSum` aggregator
-Computes and stores the sum of values as 32-bit floating point value. Similar
to `longSum` and `doubleSum`
+Computes and stores the sum of values as 32-bit floating point value. Similar
to `longSum` and `doubleSum`.
Review comment:
```suggestion
Computes and stores the sum of values as a 32-bit floating point value.
Similar to `longSum` and `doubleSum`.
```
##########
File path: docs/querying/dimensionspecs.md
##########
@@ -65,32 +65,35 @@ Returns dimension values transformed using the given
[extraction function](#extr
}
```
-`outputType` may also be specified in an ExtractionDimensionSpec to apply type
conversion to results before merging. If left unspecified, the `outputType`
defaults to STRING.
+`outputType` may also be specified in an `ExtractionDimensionSpec` to apply
type conversion to results before merging. If left unspecified, the
`outputType` defaults to STRING.
Review comment:
```suggestion
You can specify an `outputType` in an `ExtractionDimensionSpec` to apply
type conversion to results before merging. The `outputType` defaults to STRING
when not specified.
```
##########
File path: docs/querying/dimensionspecs.md
##########
@@ -65,32 +65,35 @@ Returns dimension values transformed using the given
[extraction function](#extr
}
```
-`outputType` may also be specified in an ExtractionDimensionSpec to apply type
conversion to results before merging. If left unspecified, the `outputType`
defaults to STRING.
+`outputType` may also be specified in an `ExtractionDimensionSpec` to apply
type conversion to results before merging. If left unspecified, the
`outputType` defaults to STRING.
Please refer to the [Output Types](#output-types) section for more details.
### Filtered DimensionSpecs
-These are only useful for multi-value dimensions. If you have a row in Apache
Druid that has a multi-value dimension with values ["v1", "v2", "v3"] and you
send a groupBy/topN query grouping by that dimension with [query
filter](filters.md) for value "v1". In the response you will get 3 rows
containing "v1", "v2" and "v3". This behavior might be unintuitive for some use
cases.
+A filtered `DimensionSpec` is only useful for multi-value dimensions. Say you
have a row in Apache Druid that has a multi-value dimension with values ["v1",
"v2", "v3"] and you send a groupBy/topN query grouping by that dimension with a
[query filter](filters.md) for a value of "v1". In the response you will get 3
rows containing "v1", "v2" and "v3". This behavior might be unintuitive for
some use cases.
+
+This happens because "query filter" is internally used on the bitmaps and only
used to match the row to be included in the query result processing. With
multi-value dimensions, "query filter" behaves like a contains check, which
will match the row with dimension value ["v1", "v2", "v3"].
+
+> See the section on "Multi-value columns" in [segment](../design/segments.md)
for more details.
Review comment:
```suggestion
See the section on "Multi-value columns" in [segment](../design/segments.md)
for more details.
```
##########
File path: docs/querying/dimensionspecs.md
##########
@@ -47,9 +47,9 @@ Returns dimension values as is and optionally renames the
dimension.
}
```
-When specifying a DimensionSpec on a numeric column, the user should include
the type of the column in the `outputType` field. If left unspecified, the
`outputType` defaults to STRING.
+When specifying a `DimensionSpec` on a numeric column, you should include the
type of the column in the `outputType` field. If left unspecified, the
`outputType` defaults to STRING.
Review comment:
```suggestion
When specifying a `DimensionSpec` on a numeric column, you should include
the type of the column in the `outputType` field. The `outputType` defaults to
STRING when not specified.
```
##########
File path: docs/querying/dimensionspecs.md
##########
@@ -65,32 +65,35 @@ Returns dimension values transformed using the given
[extraction function](#extr
}
```
-`outputType` may also be specified in an ExtractionDimensionSpec to apply type
conversion to results before merging. If left unspecified, the `outputType`
defaults to STRING.
+`outputType` may also be specified in an `ExtractionDimensionSpec` to apply
type conversion to results before merging. If left unspecified, the
`outputType` defaults to STRING.
Please refer to the [Output Types](#output-types) section for more details.
### Filtered DimensionSpecs
-These are only useful for multi-value dimensions. If you have a row in Apache
Druid that has a multi-value dimension with values ["v1", "v2", "v3"] and you
send a groupBy/topN query grouping by that dimension with [query
filter](filters.md) for value "v1". In the response you will get 3 rows
containing "v1", "v2" and "v3". This behavior might be unintuitive for some use
cases.
+A filtered `DimensionSpec` is only useful for multi-value dimensions. Say you
have a row in Apache Druid that has a multi-value dimension with values ["v1",
"v2", "v3"] and you send a groupBy/topN query grouping by that dimension with a
[query filter](filters.md) for a value of "v1". In the response you will get 3
rows containing "v1", "v2" and "v3". This behavior might be unintuitive for
some use cases.
+
+This happens because "query filter" is internally used on the bitmaps and only
used to match the row to be included in the query result processing. With
multi-value dimensions, "query filter" behaves like a contains check, which
will match the row with dimension value ["v1", "v2", "v3"].
Review comment:
```suggestion
This happens because Druid uses the "query filter" internally used on the
bitmaps only to match the row to include in query result processing. With
multi-value dimensions, "query filter" behaves like a contains check that
matches the row with dimension value ["v1", "v2", "v3"].
```
make sure I didn't mangle the meaning here. This whole bit on filtered
dimensions is a little confusing.
##########
File path: docs/querying/aggregations.md
##########
@@ -138,11 +138,11 @@ To accomplish mean aggregation on ingestion, refer to the
[Quantiles aggregator]
(Double/Float/Long) First and Last aggregator cannot be used in ingestion
spec, and should only be specified as part of queries.
Review comment:
```suggestion
(Double/Float/Long) Do not use First and Last aggregators in an ingestion
spec. They are only supported for queries.
```
##########
File path: docs/querying/aggregations.md
##########
@@ -444,8 +445,8 @@ following can be the possible output of the aggregator
| `["dim2"]` | 2 | (10) |
| `[]` | 3 | (11) |
-As illustrated in above example, output number can be thought of as an
unsigned n bit number where n is the number of dimensions passed to the
aggregator.
-The bit at position X is set in this number to 0 if a dimension at position X
in input to aggregators is included in the sub-grouping. Otherwise, this bit
+As illustrated in the example, the output number can be thought of as an
unsigned _n_ bit number where _n_ is the number of dimensions passed to the
aggregator.
+The bit at position X is set in this number to 0 if a dimension at position X
in the aggregator input is included in the sub-grouping. Otherwise, this bit
Review comment:
```suggestion
As the example illustrates, you can think of the output number as an
unsigned _n_ bit number where _n_ is the number of dimensions passed to the
aggregator.
Druid sets the bit at position X for the number to 0 if the sub-grouping
includes a dimension at position X in the aggregator input. Otherwise, Druid
set this bit to 1.
```
##########
File path: docs/querying/aggregations.md
##########
@@ -27,10 +27,10 @@ title: "Aggregations"
> language. For information about aggregators available in SQL, refer to the
> [SQL documentation](sql.md#aggregation-functions).
-Aggregations can be provided at ingestion time as part of the ingestion spec
as a way of summarizing data before it enters Apache Druid.
-Aggregations can also be specified as part of many queries at query time.
+Aggregations specified at ingestion time, from within the ingestion spec,
provide a way to summarize data before it enters Apache Druid.
+Aggregations are also used at query time.
-Available aggregations are:
+The following sections list the types of aggregations available at ingestion
and query time.
Review comment:
```suggestion
The following sections list the available aggregate functions. Unless
otherwise noted aggregations are available at both at ingestion and query time.
```
i think they are "aggregate functions" but may be wrong here.
##########
File path: docs/querying/dimensionspecs.md
##########
@@ -65,32 +65,35 @@ Returns dimension values transformed using the given
[extraction function](#extr
}
```
-`outputType` may also be specified in an ExtractionDimensionSpec to apply type
conversion to results before merging. If left unspecified, the `outputType`
defaults to STRING.
+`outputType` may also be specified in an `ExtractionDimensionSpec` to apply
type conversion to results before merging. If left unspecified, the
`outputType` defaults to STRING.
Please refer to the [Output Types](#output-types) section for more details.
### Filtered DimensionSpecs
-These are only useful for multi-value dimensions. If you have a row in Apache
Druid that has a multi-value dimension with values ["v1", "v2", "v3"] and you
send a groupBy/topN query grouping by that dimension with [query
filter](filters.md) for value "v1". In the response you will get 3 rows
containing "v1", "v2" and "v3". This behavior might be unintuitive for some use
cases.
+A filtered `DimensionSpec` is only useful for multi-value dimensions. Say you
have a row in Apache Druid that has a multi-value dimension with values ["v1",
"v2", "v3"] and you send a groupBy/topN query grouping by that dimension with a
[query filter](filters.md) for a value of "v1". In the response you will get 3
rows containing "v1", "v2" and "v3". This behavior might be unintuitive for
some use cases.
+
+This happens because "query filter" is internally used on the bitmaps and only
used to match the row to be included in the query result processing. With
multi-value dimensions, "query filter" behaves like a contains check, which
will match the row with dimension value ["v1", "v2", "v3"].
+
+> See the section on "Multi-value columns" in [segment](../design/segments.md)
for more details.
-It happens because "query filter" is internally used on the bitmaps and only
used to match the row to be included in the query result processing. With
multi-value dimensions, "query filter" behaves like a contains check, which
will match the row with dimension value ["v1", "v2", "v3"]. Please see the
section on "Multi-value columns" in [segment](../design/segments.md) for more
details.
-Then groupBy/topN processing pipeline "explodes" all multi-value dimensions
resulting 3 rows for "v1", "v2" and "v3" each.
+Then the groupBy/topN processing pipeline "explodes" all multi-value
dimensions resulting 3 rows for "v1", "v2" and "v3" each.
-In addition to "query filter" which efficiently selects the rows to be
processed, you can use the filtered dimension spec to filter for specific
values within the values of a multi-value dimension. These dimensionSpecs take
a delegate DimensionSpec and a filtering criteria. From the "exploded" rows,
only rows matching the given filtering criteria are returned in the query
result.
+In addition to "query filter", which efficiently selects the rows to be
processed, you can use the filtered dimension spec to filter for specific
values within the values of a multi-value dimension. These dimension specs take
a delegate `DimensionSpec` and a filtering criteria. From the "exploded" rows,
only rows matching the given filtering criteria are returned in the query
result.
-The following filtered dimension spec acts as a whitelist or blacklist for
values as per the "isWhitelist" attribute value.
+The following filtered dimension spec acts as a whitelist or blacklist for
values as per the `isWhitelist` attribute value.
Review comment:
```suggestion
The following filtered dimension spec defines the values to include or
exclude as per the `isWhitelist` attribute value.
```
trying to avoid whitelist/blacklist even though it's unavoidable in the code.
##########
File path: docs/querying/dimensionspecs.md
##########
@@ -65,32 +65,35 @@ Returns dimension values transformed using the given
[extraction function](#extr
}
```
-`outputType` may also be specified in an ExtractionDimensionSpec to apply type
conversion to results before merging. If left unspecified, the `outputType`
defaults to STRING.
+`outputType` may also be specified in an `ExtractionDimensionSpec` to apply
type conversion to results before merging. If left unspecified, the
`outputType` defaults to STRING.
Please refer to the [Output Types](#output-types) section for more details.
### Filtered DimensionSpecs
-These are only useful for multi-value dimensions. If you have a row in Apache
Druid that has a multi-value dimension with values ["v1", "v2", "v3"] and you
send a groupBy/topN query grouping by that dimension with [query
filter](filters.md) for value "v1". In the response you will get 3 rows
containing "v1", "v2" and "v3". This behavior might be unintuitive for some use
cases.
+A filtered `DimensionSpec` is only useful for multi-value dimensions. Say you
have a row in Apache Druid that has a multi-value dimension with values ["v1",
"v2", "v3"] and you send a groupBy/topN query grouping by that dimension with a
[query filter](filters.md) for a value of "v1". In the response you will get 3
rows containing "v1", "v2" and "v3". This behavior might be unintuitive for
some use cases.
+
+This happens because "query filter" is internally used on the bitmaps and only
used to match the row to be included in the query result processing. With
multi-value dimensions, "query filter" behaves like a contains check, which
will match the row with dimension value ["v1", "v2", "v3"].
+
+> See the section on "Multi-value columns" in [segment](../design/segments.md)
for more details.
-It happens because "query filter" is internally used on the bitmaps and only
used to match the row to be included in the query result processing. With
multi-value dimensions, "query filter" behaves like a contains check, which
will match the row with dimension value ["v1", "v2", "v3"]. Please see the
section on "Multi-value columns" in [segment](../design/segments.md) for more
details.
-Then groupBy/topN processing pipeline "explodes" all multi-value dimensions
resulting 3 rows for "v1", "v2" and "v3" each.
+Then the groupBy/topN processing pipeline "explodes" all multi-value
dimensions resulting 3 rows for "v1", "v2" and "v3" each.
-In addition to "query filter" which efficiently selects the rows to be
processed, you can use the filtered dimension spec to filter for specific
values within the values of a multi-value dimension. These dimensionSpecs take
a delegate DimensionSpec and a filtering criteria. From the "exploded" rows,
only rows matching the given filtering criteria are returned in the query
result.
+In addition to "query filter", which efficiently selects the rows to be
processed, you can use the filtered dimension spec to filter for specific
values within the values of a multi-value dimension. These dimension specs take
a delegate `DimensionSpec` and a filtering criteria. From the "exploded" rows,
only rows matching the given filtering criteria are returned in the query
result.
-The following filtered dimension spec acts as a whitelist or blacklist for
values as per the "isWhitelist" attribute value.
+The following filtered dimension spec acts as a whitelist or blacklist for
values as per the `isWhitelist` attribute value.
```json
{ "type" : "listFiltered", "delegate" : <dimensionSpec>, "values": <array of
strings>, "isWhitelist": <optional attribute for true/false, default is true> }
```
-Following filtered dimension spec retains only the values matching regex. Note
that `listFiltered` is faster than this and one should use that for whitelist
or blacklist use case.
+The following filtered dimension spec retains only the values matching regex.
Note that `listFiltered` is faster than this and one should use that for
whitelist or blacklist use case.
Review comment:
```suggestion
The following filtered dimension spec retains only the values matching a
regex. You should use the `listFiltered` function for inclusion and exclusion
use cases because it is faster.
```
##########
File path: docs/querying/dimensionspecs.md
##########
@@ -32,7 +32,7 @@ The following JSON fields can be used in a query to operate
on dimension values.
## DimensionSpec
-`DimensionSpec`s define how dimension values get transformed prior to
aggregation.
+A `DimensionSpec` defines how dimension values get transformed prior to
aggregation.
Review comment:
```suggestion
A `DimensionSpec` defines how to transform dimension values prior to
aggregation.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]