[GitHub] [incubator-druid] clintropolis commented on a change in pull request #6794: Query vectorization.

GitBox Sun, 07 Jul 2019 19:03:25 -0700

clintropolis commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r300889968


 ##########
 File path: 
benchmarks/src/main/java/org/apache/druid/benchmark/query/SqlBenchmark.java
 ##########
 @@ -64,39 +60,112 @@
 import org.openjdk.jmh.annotations.Warmup;
 import org.openjdk.jmh.infra.Blackhole;
 
-import java.io.File;
-import java.util.HashMap;
+import javax.annotation.Nullable;
 import java.util.List;
+import java.util.Map;
 import java.util.concurrent.TimeUnit;
 
 /**
- * Benchmark that compares the same groupBy query through the native query 
layer and through the SQL layer.
+ * Benchmark that tests various SQL queries.
  */
 @State(Scope.Benchmark)
 @Fork(value = 1)
 @Warmup(iterations = 15)
-@Measurement(iterations = 30)
+@Measurement(iterations = 25)
 public class SqlBenchmark
 {
-  @Param({"200000", "1000000"})
-  private int rowsPerSegment;
+  static {
+    Calcites.setSystemProperties();
+  }
 
   private static final Logger log = new Logger(SqlBenchmark.class);
 
-  private File tmpDir;
-  private SegmentGenerator segmentGenerator;
-  private SpecificSegmentsQuerySegmentWalker walker;
-  private SqlLifecycleFactory sqlLifecycleFactory;
-  private GroupByQuery groupByQuery;
-  private String sqlQuery;
-  private Closer resourceCloser;
+  private static final List<String> QUERIES = ImmutableList.of(
 
 Review comment:
   nice collection of queries :+1:
   
   Side note, my gut tells me that the sequential dimensions and metrics we 
have in the benchmarks schemas are probably not super typical of data that 
appears in real datasets other than maybe the time column?
   
   The zipf dimension in the basic schema (and all current benchmark schemas) 
is pretty low cardinality because of how it's currently setup where it 
enumerates out all the values into an array. It takes an a lot of memory and is 
very slow to do higher cardinalities. To get high cardinality zipf 
distributions [I had to modify the generator to make a lazy 
version](https://github.com/apache/incubator-druid/pull/6016/files#diff-d0bb8eedfa1b391647e51a0658afed31R212),
 which I hope to make it in someday, and could help produce additional column 
value distributions for higher cardinalities and with different exponents.
   
   Where I'm going with this is, that I think it could be interesting in the 
future to tweak these to allow running the same queries on different value 
distributions to see if there is any effect, though I'm unsure how we would 
handle filter value matches.
   
   No need to do anything in this PR tho imo, just thinking out loud about 
stuff to do in the future maybe

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-druid] clintropolis commented on a change in pull request #6794: Query vectorization.

Reply via email to