This is an automated email from the ASF dual-hosted git repository.
abhishek pushed a commit to branch 0.23.0
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/0.23.0 by this push:
new ccda449580 Enable vectorized virtual column processing by default.
(#12520) (#12525)
ccda449580 is described below
commit ccda449580e39dbaf56a681894c675a7a3b2d7a7
Author: Abhishek Agarwal <[email protected]>
AuthorDate: Fri May 20 10:24:23 2022 +0530
Enable vectorized virtual column processing by default. (#12520) (#12525)
In the majority of cases, this improves performance.
There's only one case I'm aware of where this may be a net negative: for
time_floor(__time, <period>) where there are many repeated __time values. In
nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector
implements an optimization to avoid computing the time_floor function on every
row. There is no such optimization in vectorized processing.
IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a
thing: it's not guaranteed that nonvectorized processing will be faster due to
the optimization, because it would have to overcome the inherent speed
advantage of vectorization. So it'd always require testing to determine the
best setting for a specific dataset. It would be bad if users disabled
vectorization thinking it would speed up their queries, and it actually slowed
them down. And even if users do their [...]
Co-authored-by: Gian Merlino <[email protected]>
---
docs/querying/query-context.md | 2 +-
processing/src/main/java/org/apache/druid/query/QueryContexts.java | 2 +-
.../org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java | 3 +--
3 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/docs/querying/query-context.md b/docs/querying/query-context.md
index 158b59bd04..111856a46c 100644
--- a/docs/querying/query-context.md
+++ b/docs/querying/query-context.md
@@ -125,4 +125,4 @@ vectorization. These query types will ignore the
"vectorize" parameter even if i
|--------|-------|------------|
|vectorize|`true`|Enables or disables vectorized query execution. Possible
values are `false` (disabled), `true` (enabled if possible, disabled otherwise,
on a per-segment basis), and `force` (enabled, and groupBy or timeseries
queries that cannot be vectorized will fail). The `"force"` setting is meant to
aid in testing, and is not generally useful in production (since real-time
segments can never be processed with vectorized execution, any queries on
real-time data will fail). This wil [...]
|vectorSize|`512`|Sets the row batching size for a particular query. This will
override `druid.query.default.context.vectorSize` if it's set.|
-|vectorizeVirtualColumns|`false`|Enables or disables vectorized query
processing of queries with virtual columns, layered on top of `vectorize`
(`vectorize` must also be set to true for a query to utilize vectorization).
Possible values are `false` (disabled), `true` (enabled if possible, disabled
otherwise, on a per-segment basis), and `force` (enabled, and groupBy or
timeseries queries with virtual columns that cannot be vectorized will fail).
The `"force"` setting is meant to aid in t [...]
+|vectorizeVirtualColumns|`true`|Enables or disables vectorized query
processing of queries with virtual columns, layered on top of `vectorize`
(`vectorize` must also be set to true for a query to utilize vectorization).
Possible values are `false` (disabled), `true` (enabled if possible, disabled
otherwise, on a per-segment basis), and `force` (enabled, and groupBy or
timeseries queries with virtual columns that cannot be vectorized will fail).
The `"force"` setting is meant to aid in te [...]
diff --git a/processing/src/main/java/org/apache/druid/query/QueryContexts.java
b/processing/src/main/java/org/apache/druid/query/QueryContexts.java
index 5768c6a067..65975d3a60 100644
--- a/processing/src/main/java/org/apache/druid/query/QueryContexts.java
+++ b/processing/src/main/java/org/apache/druid/query/QueryContexts.java
@@ -77,7 +77,7 @@ public class QueryContexts
public static final boolean DEFAULT_POPULATE_RESULTLEVEL_CACHE = true;
public static final boolean DEFAULT_USE_RESULTLEVEL_CACHE = true;
public static final Vectorize DEFAULT_VECTORIZE = Vectorize.TRUE;
- public static final Vectorize DEFAULT_VECTORIZE_VIRTUAL_COLUMN =
Vectorize.FALSE;
+ public static final Vectorize DEFAULT_VECTORIZE_VIRTUAL_COLUMN =
Vectorize.TRUE;
public static final int DEFAULT_PRIORITY = 0;
public static final int DEFAULT_UNCOVERED_INTERVALS_LIMIT = 0;
public static final long DEFAULT_TIMEOUT_MILLIS =
TimeUnit.MINUTES.toMillis(5);
diff --git
a/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java
b/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java
index 126e304ae0..9bfdafeb35 100644
---
a/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java
+++
b/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java
@@ -289,9 +289,8 @@ public class VectorizedVirtualColumnTest
}
@Test
- public void testTimeseriesTrueVirtualContextCannotVectorize()
+ public void testTimeseriesTrueVirtualContextDefault()
{
- expectNonvectorized();
testTimeseries(
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ColumnType.FLOAT),
CONTEXT_USE_DEFAULTS,
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]