iChauster commented on PR #13426: URL: https://github.com/apache/arrow/pull/13426#issuecomment-1196029255
Previous description: Hi @westonpace, Here is a _very_ primitive version of our Asof Join Benchmarks (`asof_join_benchmark.cc`). Our main goal is to benchmark on four qualities: the effect of table density (the frequency of rows, e.g a row every 2s as opposed to every 1h over some time range), table width (# of columns), tids (# of keys), and multi-table joins. We also have a baseline comparison benchmark with hash joins (which is currently in this file). I think this needs some work before it goes into arrow. We currently run this benchmark by generating `.feather` files with Python via bamboo-streaming's datagen.py to represent each table, and then reading them in through cpp (see `make_arrow_ipc_reader_node`). We perhaps want to write a utility that allows us to do this in cpp, while varying many of the metrics I've mentioned above, or finding a way to generate those files as part of the benchmark. There are also quite a large number of `BENCHMARK_CAPTURE` statements, as an immediate workaround to some limitations in Google Benchmarks. I haven't found a great non-verbose way to pass in the parameters needed (strings and vectors) while also having readable titles and details about the benchmark being written to the output file. Let me know if you have any advice about this / know some one who does. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
