iChauster commented on PR #13426:
URL: https://github.com/apache/arrow/pull/13426#issuecomment-1196029255

   Previous description: 
   
   Hi @westonpace,
   
   Here is a _very_ primitive version of our Asof Join Benchmarks 
(`asof_join_benchmark.cc`). Our main goal is to benchmark on four qualities: 
the effect of table density (the frequency of rows, e.g a row every 2s as 
opposed to every 1h over some time range), table width (# of columns), tids (# 
of keys), and multi-table joins. We also have a baseline comparison benchmark 
with hash joins (which is currently in this file).
   
   I think this needs some work before it goes into arrow. We currently run 
this benchmark by generating `.feather` files with Python via 
bamboo-streaming's datagen.py to represent each table, and then reading them in 
through cpp (see `make_arrow_ipc_reader_node`). We perhaps want to write a 
utility that allows us to do this in cpp, while varying many of the metrics 
I've mentioned above, or finding a way to generate those files as part of the 
benchmark.
   
   There are also quite a large number of `BENCHMARK_CAPTURE` statements, as an 
immediate workaround to some limitations in Google Benchmarks. I haven't found 
a great non-verbose way to pass in the parameters needed (strings and vectors) 
while also having readable titles and details about the benchmark being written 
to the output file. Let me know if you have any advice about this / know some 
one who does.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to