[druid] branch 0.23.0 updated: Enable vectorized virtual column processing by default. (#12520) (#12525)

abhishek Thu, 19 May 2022 21:54:37 -0700

This is an automated email from the ASF dual-hosted git repository.

abhishek pushed a commit to branch 0.23.0
in repository https://gitbox.apache.org/repos/asf/druid.git



The following commit(s) were added to refs/heads/0.23.0 by this push:
     new ccda449580 Enable vectorized virtual column processing by default. 
(#12520) (#12525)
ccda449580 is described below

commit ccda449580e39dbaf56a681894c675a7a3b2d7a7
Author: Abhishek Agarwal <[email protected]>
AuthorDate: Fri May 20 10:24:23 2022 +0530

    Enable vectorized virtual column processing by default. (#12520) (#12525)
    
    In the majority of cases, this improves performance.
    
    There's only one case I'm aware of where this may be a net negative: for 
time_floor(__time, <period>) where there are many repeated __time values. In 
nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector 
implements an optimization to avoid computing the time_floor function on every 
row. There is no such optimization in vectorized processing.
    
    IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a 
thing: it's not guaranteed that nonvectorized processing will be faster due to 
the optimization, because it would have to overcome the inherent speed 
advantage of vectorization. So it'd always require testing to determine the 
best setting for a specific dataset. It would be bad if users disabled 
vectorization thinking it would speed up their queries, and it actually slowed 
them down. And even if users do their [...]
    
    Co-authored-by: Gian Merlino <[email protected]>
---
 docs/querying/query-context.md                                         | 2 +-
 processing/src/main/java/org/apache/druid/query/QueryContexts.java     | 2 +-
 .../org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java  | 3 +--
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/docs/querying/query-context.md b/docs/querying/query-context.md
index 158b59bd04..111856a46c 100644
--- a/docs/querying/query-context.md
+++ b/docs/querying/query-context.md
@@ -125,4 +125,4 @@ vectorization. These query types will ignore the 
"vectorize" parameter even if i
 |--------|-------|------------|
 |vectorize|`true`|Enables or disables vectorized query execution. Possible 
values are `false` (disabled), `true` (enabled if possible, disabled otherwise, 
on a per-segment basis), and `force` (enabled, and groupBy or timeseries 
queries that cannot be vectorized will fail). The `"force"` setting is meant to 
aid in testing, and is not generally useful in production (since real-time 
segments can never be processed with vectorized execution, any queries on 
real-time data will fail). This wil [...]
 |vectorSize|`512`|Sets the row batching size for a particular query. This will 
override `druid.query.default.context.vectorSize` if it's set.|
-|vectorizeVirtualColumns|`false`|Enables or disables vectorized query 
processing of queries with virtual columns, layered on top of `vectorize` 
(`vectorize` must also be set to true for a query to utilize vectorization). 
Possible values are `false` (disabled), `true` (enabled if possible, disabled 
otherwise, on a per-segment basis), and `force` (enabled, and groupBy or 
timeseries queries with virtual columns that cannot be vectorized will fail). 
The `"force"` setting is meant to aid in t [...]
+|vectorizeVirtualColumns|`true`|Enables or disables vectorized query 
processing of queries with virtual columns, layered on top of `vectorize` 
(`vectorize` must also be set to true for a query to utilize vectorization). 
Possible values are `false` (disabled), `true` (enabled if possible, disabled 
otherwise, on a per-segment basis), and `force` (enabled, and groupBy or 
timeseries queries with virtual columns that cannot be vectorized will fail). 
The `"force"` setting is meant to aid in te [...]
diff --git a/processing/src/main/java/org/apache/druid/query/QueryContexts.java 
b/processing/src/main/java/org/apache/druid/query/QueryContexts.java
index 5768c6a067..65975d3a60 100644
--- a/processing/src/main/java/org/apache/druid/query/QueryContexts.java
+++ b/processing/src/main/java/org/apache/druid/query/QueryContexts.java
@@ -77,7 +77,7 @@ public class QueryContexts
   public static final boolean DEFAULT_POPULATE_RESULTLEVEL_CACHE = true;
   public static final boolean DEFAULT_USE_RESULTLEVEL_CACHE = true;
   public static final Vectorize DEFAULT_VECTORIZE = Vectorize.TRUE;
-  public static final Vectorize DEFAULT_VECTORIZE_VIRTUAL_COLUMN = 
Vectorize.FALSE;
+  public static final Vectorize DEFAULT_VECTORIZE_VIRTUAL_COLUMN = 
Vectorize.TRUE;
   public static final int DEFAULT_PRIORITY = 0;
   public static final int DEFAULT_UNCOVERED_INTERVALS_LIMIT = 0;
   public static final long DEFAULT_TIMEOUT_MILLIS = 
TimeUnit.MINUTES.toMillis(5);
diff --git 
a/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java
 
b/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java
index 126e304ae0..9bfdafeb35 100644
--- 
a/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java
+++ 
b/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java
@@ -289,9 +289,8 @@ public class VectorizedVirtualColumnTest
   }
 
   @Test
-  public void testTimeseriesTrueVirtualContextCannotVectorize()
+  public void testTimeseriesTrueVirtualContextDefault()
   {
-    expectNonvectorized();
     testTimeseries(
         
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ColumnType.FLOAT),
         CONTEXT_USE_DEFAULTS,


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[druid] branch 0.23.0 updated: Enable vectorized virtual column processing by default. (#12520) (#12525)

Reply via email to