iChauster commented on PR #13426: URL: https://github.com/apache/arrow/pull/13426#issuecomment-1171363759
Thanks for the help so far, @westonpace ! > Does the table density get applied uniformly over all input columns? In other words, do you worry about cases where one input is very dense and the others are not so dense? If what you're referring to multi-table joins (a left hand table joined with one or multiple right hand tables), all right hand tables share the same properties. I think it would be interesting to write a few benchmarks where the right hand tables have a bit more variation amongst themselves. If you're referring to within the same table, I think our benchmarking approach currently doesn't test case joins with high time freq and low id density (and vice versa). We currently have set some 'baselines' and try to only vary one property at a time. For example, if we are interested in time frequency, we set the remaining properties to the baseline, and test time frequency in the symmetric case (LH table and RH table have the same time frequency for various values), and the asymmetric case. Since it seems performance varies mostly from time frequency and key density, I can see a few benchmarks written where we vary both at the same time. > When you say multi-table joins how many tables are you testing? Or is that a parameter? This is a parameter. We currently are testing 1 table to 51 table joins for `AsOf` (the number of inputs to the `AsOfJoin` node). > number of columns and number of keys is good. Eventually you will need to worry about data types I would think (probably more for payload columns than for key columns) Yes -- although I think this would require both a slight refactor for `AsOf` implementation as well as a more heavy change in the way we generate our tables. Currently, we don't have too much flexibility in data types in the bamboo streaming table generation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
