[druid] branch master updated: Enable vectorized virtual column processing by default. (#12520)

abhishek Mon, 16 May 2022 03:14:11 -0700

This is an automated email from the ASF dual-hosted git repository.

abhishek pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git



The following commit(s) were added to refs/heads/master by this push:
     new 5b6727f319 Enable vectorized virtual column processing by default. 
(#12520)
5b6727f319 is described below

commit 5b6727f3195ac9bad906e3416bf8997b069c222f
Author: Gian Merlino <[email protected]>
AuthorDate: Mon May 16 03:13:53 2022 -0700

    Enable vectorized virtual column processing by default. (#12520)
    
    In the majority of cases, this improves performance.
    
    There's only one case I'm aware of where this may be a net negative: for 
time_floor(__time, <period>) where there are many repeated __time values. In 
nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector 
implements an optimization to avoid computing the time_floor function on every 
row. There is no such optimization in vectorized processing.
    
    IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a 
thing: it's not guaranteed that nonvectorized processing will be faster due to 
the optimization, because it would have to overcome the inherent speed 
advantage of vectorization. So it'd always require testing to determine the 
best setting for a specific dataset. It would be bad if users disabled 
vectorization thinking it would speed up their queries, and it actually slowed 
them down. And even if users do their [...]
---
 docs/querying/query-context.md                                         | 2 +-
 processing/src/main/java/org/apache/druid/query/QueryContexts.java     | 2 +-
 .../org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java  | 3 +--
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/docs/querying/query-context.md b/docs/querying/query-context.md
index 8c88e8749d..bded042ecb 100644
--- a/docs/querying/query-context.md
+++ b/docs/querying/query-context.md
@@ -126,4 +126,4 @@ vectorization. These query types will ignore the 
"vectorize" parameter even if i
 |--------|-------|------------|
 |vectorize|`true`|Enables or disables vectorized query execution. Possible 
values are `false` (disabled), `true` (enabled if possible, disabled otherwise, 
on a per-segment basis), and `force` (enabled, and groupBy or timeseries 
queries that cannot be vectorized will fail). The `"force"` setting is meant to 
aid in testing, and is not generally useful in production (since real-time 
segments can never be processed with vectorized execution, any queries on 
real-time data will fail). This wil [...]
 |vectorSize|`512`|Sets the row batching size for a particular query. This will 
override `druid.query.default.context.vectorSize` if it's set.|
-|vectorizeVirtualColumns|`false`|Enables or disables vectorized query 
processing of queries with virtual columns, layered on top of `vectorize` 
(`vectorize` must also be set to true for a query to utilize vectorization). 
Possible values are `false` (disabled), `true` (enabled if possible, disabled 
otherwise, on a per-segment basis), and `force` (enabled, and groupBy or 
timeseries queries with virtual columns that cannot be vectorized will fail). 
The `"force"` setting is meant to aid in t [...]
+|vectorizeVirtualColumns|`true`|Enables or disables vectorized query 
processing of queries with virtual columns, layered on top of `vectorize` 
(`vectorize` must also be set to true for a query to utilize vectorization). 
Possible values are `false` (disabled), `true` (enabled if possible, disabled 
otherwise, on a per-segment basis), and `force` (enabled, and groupBy or 
timeseries queries with virtual columns that cannot be vectorized will fail). 
The `"force"` setting is meant to aid in te [...]
diff --git a/processing/src/main/java/org/apache/druid/query/QueryContexts.java 
b/processing/src/main/java/org/apache/druid/query/QueryContexts.java
index a4defa7b08..73bf04fcae 100644
--- a/processing/src/main/java/org/apache/druid/query/QueryContexts.java
+++ b/processing/src/main/java/org/apache/druid/query/QueryContexts.java
@@ -78,7 +78,7 @@ public class QueryContexts
   public static final boolean DEFAULT_POPULATE_RESULTLEVEL_CACHE = true;
   public static final boolean DEFAULT_USE_RESULTLEVEL_CACHE = true;
   public static final Vectorize DEFAULT_VECTORIZE = Vectorize.TRUE;
-  public static final Vectorize DEFAULT_VECTORIZE_VIRTUAL_COLUMN = 
Vectorize.FALSE;
+  public static final Vectorize DEFAULT_VECTORIZE_VIRTUAL_COLUMN = 
Vectorize.TRUE;
   public static final int DEFAULT_PRIORITY = 0;
   public static final int DEFAULT_UNCOVERED_INTERVALS_LIMIT = 0;
   public static final long DEFAULT_TIMEOUT_MILLIS = 
TimeUnit.MINUTES.toMillis(5);
diff --git 
a/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java
 
b/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java
index 126e304ae0..9bfdafeb35 100644
--- 
a/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java
+++ 
b/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java
@@ -289,9 +289,8 @@ public class VectorizedVirtualColumnTest
   }
 
   @Test
-  public void testTimeseriesTrueVirtualContextCannotVectorize()
+  public void testTimeseriesTrueVirtualContextDefault()
   {
-    expectNonvectorized();
     testTimeseries(
         
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ColumnType.FLOAT),
         CONTEXT_USE_DEFAULTS,


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[druid] branch master updated: Enable vectorized virtual column processing by default. (#12520)

Reply via email to