clintropolis commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r300889968
##########
File path:
benchmarks/src/main/java/org/apache/druid/benchmark/query/SqlBenchmark.java
##########
@@ -64,39 +60,112 @@
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
-import java.io.File;
-import java.util.HashMap;
+import javax.annotation.Nullable;
import java.util.List;
+import java.util.Map;
import java.util.concurrent.TimeUnit;
/**
- * Benchmark that compares the same groupBy query through the native query
layer and through the SQL layer.
+ * Benchmark that tests various SQL queries.
*/
@State(Scope.Benchmark)
@Fork(value = 1)
@Warmup(iterations = 15)
-@Measurement(iterations = 30)
+@Measurement(iterations = 25)
public class SqlBenchmark
{
- @Param({"200000", "1000000"})
- private int rowsPerSegment;
+ static {
+ Calcites.setSystemProperties();
+ }
private static final Logger log = new Logger(SqlBenchmark.class);
- private File tmpDir;
- private SegmentGenerator segmentGenerator;
- private SpecificSegmentsQuerySegmentWalker walker;
- private SqlLifecycleFactory sqlLifecycleFactory;
- private GroupByQuery groupByQuery;
- private String sqlQuery;
- private Closer resourceCloser;
+ private static final List<String> QUERIES = ImmutableList.of(
Review comment:
nice collection of queries :+1:
Side note, my gut tells me that the sequential dimensions and metrics we
have in the benchmarks schemas are probably not super typical of data that
appears in real datasets other than maybe the time column?
The zipf dimension in the basic schema (and all current benchmark schemas)
is pretty low cardinality because of how it's currently setup where it
enumerates out all the values into an array. It takes an a lot of memory and is
very slow to do higher cardinalities. To get high cardinality zipf
distributions [I had to modify the generator to make a lazy
version](https://github.com/apache/incubator-druid/pull/6016/files#diff-d0bb8eedfa1b391647e51a0658afed31R212),
which I hope to make it in someday, and could help produce additional column
value distributions for higher cardinalities and with different exponents.
Where I'm going with this is, that I think it could be interesting in the
future to tweak these to allow running the same queries on different value
distributions to see if there is any effect, though I'm unsure how we would
handle filter value matches.
No need to do anything in this PR tho imo, just thinking out loud about
stuff to do in the future maybe
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]