iChauster commented on PR #13426:
URL: https://github.com/apache/arrow/pull/13426#issuecomment-1171363759

   Thanks for the help so far, @westonpace !
   > Does the table density get applied uniformly over all input columns? In 
other words, do you worry about cases where one input is very dense and the 
others are not so dense?
   If what you're referring to multi-table joins (a left hand table joined with 
one or multiple right hand tables), all right hand tables share the same 
properties. I think it would be interesting to write a few benchmarks where the 
right hand tables have a bit more variation amongst themselves.
   
   If you're referring to within the same table, I think our benchmarking 
approach currently doesn't test case joins with high time freq and low id 
density (and vice versa). We currently have set some 'baselines' and try to 
only vary one property at a time. For example, if we are interested in time 
frequency, we set the remaining properties to the baseline, and test time 
frequency in the symmetric case (LH table and RH table have the same time 
frequency for various values), and the asymmetric case.
   
   Since it seems performance varies mostly from time frequency and key 
density, I can see a few benchmarks written where we vary both at the same time.
   
   > When you say multi-table joins how many tables are you testing? Or is that 
a parameter?
   This is a parameter. We currently are testing 1 table to 51 table joins for 
`AsOf` (the number of inputs to the `AsOfJoin` node).
   
   > number of columns and number of keys is good. Eventually you will need to 
worry about data types I would think (probably more for payload columns than 
for key columns)
   Yes -- although I think this would require both a slight refactor for `AsOf` 
implementation as well as a more heavy change in the way we generate our 
tables. Currently, we don't have too much flexibility in data types in the 
bamboo streaming table generation.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to