This is an automated email from the ASF dual-hosted git repository.
abhishek pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/master by this push:
new 5b6727f319 Enable vectorized virtual column processing by default.
(#12520)
5b6727f319 is described below
commit 5b6727f3195ac9bad906e3416bf8997b069c222f
Author: Gian Merlino <[email protected]>
AuthorDate: Mon May 16 03:13:53 2022 -0700
Enable vectorized virtual column processing by default. (#12520)
In the majority of cases, this improves performance.
There's only one case I'm aware of where this may be a net negative: for
time_floor(__time, <period>) where there are many repeated __time values. In
nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector
implements an optimization to avoid computing the time_floor function on every
row. There is no such optimization in vectorized processing.
IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a
thing: it's not guaranteed that nonvectorized processing will be faster due to
the optimization, because it would have to overcome the inherent speed
advantage of vectorization. So it'd always require testing to determine the
best setting for a specific dataset. It would be bad if users disabled
vectorization thinking it would speed up their queries, and it actually slowed
them down. And even if users do their [...]
---
docs/querying/query-context.md | 2 +-
processing/src/main/java/org/apache/druid/query/QueryContexts.java | 2 +-
.../org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java | 3 +--
3 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/docs/querying/query-context.md b/docs/querying/query-context.md
index 8c88e8749d..bded042ecb 100644
--- a/docs/querying/query-context.md
+++ b/docs/querying/query-context.md
@@ -126,4 +126,4 @@ vectorization. These query types will ignore the
"vectorize" parameter even if i
|--------|-------|------------|
|vectorize|`true`|Enables or disables vectorized query execution. Possible
values are `false` (disabled), `true` (enabled if possible, disabled otherwise,
on a per-segment basis), and `force` (enabled, and groupBy or timeseries
queries that cannot be vectorized will fail). The `"force"` setting is meant to
aid in testing, and is not generally useful in production (since real-time
segments can never be processed with vectorized execution, any queries on
real-time data will fail). This wil [...]
|vectorSize|`512`|Sets the row batching size for a particular query. This will
override `druid.query.default.context.vectorSize` if it's set.|
-|vectorizeVirtualColumns|`false`|Enables or disables vectorized query
processing of queries with virtual columns, layered on top of `vectorize`
(`vectorize` must also be set to true for a query to utilize vectorization).
Possible values are `false` (disabled), `true` (enabled if possible, disabled
otherwise, on a per-segment basis), and `force` (enabled, and groupBy or
timeseries queries with virtual columns that cannot be vectorized will fail).
The `"force"` setting is meant to aid in t [...]
+|vectorizeVirtualColumns|`true`|Enables or disables vectorized query
processing of queries with virtual columns, layered on top of `vectorize`
(`vectorize` must also be set to true for a query to utilize vectorization).
Possible values are `false` (disabled), `true` (enabled if possible, disabled
otherwise, on a per-segment basis), and `force` (enabled, and groupBy or
timeseries queries with virtual columns that cannot be vectorized will fail).
The `"force"` setting is meant to aid in te [...]
diff --git a/processing/src/main/java/org/apache/druid/query/QueryContexts.java
b/processing/src/main/java/org/apache/druid/query/QueryContexts.java
index a4defa7b08..73bf04fcae 100644
--- a/processing/src/main/java/org/apache/druid/query/QueryContexts.java
+++ b/processing/src/main/java/org/apache/druid/query/QueryContexts.java
@@ -78,7 +78,7 @@ public class QueryContexts
public static final boolean DEFAULT_POPULATE_RESULTLEVEL_CACHE = true;
public static final boolean DEFAULT_USE_RESULTLEVEL_CACHE = true;
public static final Vectorize DEFAULT_VECTORIZE = Vectorize.TRUE;
- public static final Vectorize DEFAULT_VECTORIZE_VIRTUAL_COLUMN =
Vectorize.FALSE;
+ public static final Vectorize DEFAULT_VECTORIZE_VIRTUAL_COLUMN =
Vectorize.TRUE;
public static final int DEFAULT_PRIORITY = 0;
public static final int DEFAULT_UNCOVERED_INTERVALS_LIMIT = 0;
public static final long DEFAULT_TIMEOUT_MILLIS =
TimeUnit.MINUTES.toMillis(5);
diff --git
a/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java
b/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java
index 126e304ae0..9bfdafeb35 100644
---
a/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java
+++
b/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java
@@ -289,9 +289,8 @@ public class VectorizedVirtualColumnTest
}
@Test
- public void testTimeseriesTrueVirtualContextCannotVectorize()
+ public void testTimeseriesTrueVirtualContextDefault()
{
- expectNonvectorized();
testTimeseries(
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ColumnType.FLOAT),
CONTEXT_USE_DEFAULTS,
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]