tyrelr commented on pull request #9440: URL: https://github.com/apache/arrow/pull/9440#issuecomment-774786663
Just for some vague performance numbers... from before I squashed a couple simple commits look good. 15% difference seems to be beneficial or neutral for the comparison kernels. I can't explain the buffer collection regressions as they should be completely unrelated (but they did reproduce for both my test runs). The buffer_bit & cast look like noise. ``` group 2master-a321cded 2value_iteration-7c21fa33 master-a321cded value_iteration-7c21fa33 ----- ---------------- ------------------------- --------------- ------------------------ Buffer::from_iter bool 1.00 6.1±0.01ms ? B/sec 1.73 10.6±0.04ms ? B/sec 1.03 6.3±0.01ms ? B/sec 1.47 9.0±0.02ms ? B/sec MutableBuffer::from_iter bool 1.00 6.1±0.01ms ? B/sec 1.42 8.7±0.01ms ? B/sec 1.03 6.3±0.01ms ? B/sec 1.46 8.9±0.02ms ? B/sec array_from_vec 128 1.06 419.7±2.63ns ? B/sec 1.04 414.6±0.76ns ? B/sec 1.00 397.0±1.55ns ? B/sec 1.16 459.0±1.41ns ? B/sec buffer_bit_ops and 1.31 320.2±0.33ns ? B/sec 1.00 243.7±0.27ns ? B/sec 1.32 321.0±0.61ns ? B/sec 1.31 318.3±0.70ns ? B/sec buffer_bit_ops or 1.00 277.1±0.31ns ? B/sec 1.06 292.1±0.54ns ? B/sec 1.37 378.3±0.41ns ? B/sec 1.00 276.6±0.94ns ? B/sec cast date32 to date64 512 1.00 522.8±1.51ns ? B/sec 1.00 523.2±0.65ns ? B/sec 1.17 610.1±0.61ns ? B/sec 1.18 615.5±0.83ns ? B/sec cast time32s to time32ms 512 1.00 343.6±0.74ns ? B/sec 1.25 428.8±0.41ns ? B/sec 1.24 426.6±0.42ns ? B/sec 1.01 346.8±1.42ns ? B/sec eq Float32 1.50 90.5±0.14µs ? B/sec 1.00 60.2±0.09µs ? B/sec 1.50 90.5±0.13µs ? B/sec 1.00 60.2±0.07µs ? B/sec eq scalar Float32 1.35 79.6±0.18µs ? B/sec 1.00 59.2±0.13µs ? B/sec 1.35 79.8±0.09µs ? B/sec 1.00 59.1±0.09µs ? B/sec from_slice 1.81 900.0±1.74µs ? B/sec 1.00 497.0±0.98µs ? B/sec 1.76 875.2±1.17µs ? B/sec 1.02 508.3±0.88µs ? B/sec gt Float32 1.56 86.0±0.10µs ? B/sec 1.00 55.0±0.07µs ? B/sec 1.56 86.0±0.10µs ? B/sec 1.00 55.1±0.24µs ? B/sec gt scalar Float32 1.38 72.3±0.14µs ? B/sec 1.00 52.3±0.05µs ? B/sec 1.38 72.3±0.19µs ? B/sec 1.00 52.2±0.04µs ? B/sec gt_eq Float32 1.55 75.4±0.15µs ? B/sec 1.00 48.6±0.09µs ? B/sec 1.55 75.5±0.09µs ? B/sec 1.00 48.7±0.08µs ? B/sec gt_eq scalar Float32 1.32 62.8±0.07µs ? B/sec 1.00 47.5±0.07µs ? B/sec 1.33 63.0±0.07µs ? B/sec 1.00 47.5±0.06µs ? B/sec limit 512, 512 1.00 116.3±0.18ns ? B/sec 1.15 133.9±0.26ns ? B/sec 1.00 116.3±0.22ns ? B/sec 1.08 126.0±0.23ns ? B/sec lt Float32 1.56 85.8±0.08µs ? B/sec 1.00 55.0±0.09µs ? B/sec 1.56 86.1±0.19µs ? B/sec 1.00 55.1±0.16µs ? B/sec lt scalar Float32 1.35 71.7±0.11µs ? B/sec 1.00 53.1±0.06µs ? B/sec 1.35 71.7±0.18µs ? B/sec 1.00 53.0±0.05µs ? B/sec lt_eq Float32 1.55 75.7±0.09µs ? B/sec 1.00 48.8±0.09µs ? B/sec 1.55 75.7±0.13µs ? B/sec 1.00 48.7±0.06µs ? B/sec lt_eq scalar Float32 1.35 62.0±0.06µs ? B/sec 1.00 46.1±0.04µs ? B/sec 1.35 62.0±0.09µs ? B/sec 1.01 46.4±0.08µs ? B/sec mutable 1.00 419.1±0.99µs ? B/sec 4.01 1682.2±2.66µs ? B/sec 1.05 441.9±0.95µs ? B/sec 3.83 1606.8±2.63µs ? B/sec mutable extend 1.00 833.1±1.97µs ? B/sec 2.48 2.1±0.00ms ? B/sec 1.00 836.2±1.22µs ? B/sec 2.45 2.0±0.00ms ? B/sec mutable iter extend_from_slice 1.00 1004.0±1.32µs ? B/sec 2.19 2.2±0.00ms ? B/sec 1.00 1003.5±1.56µs ? B/sec 2.19 2.2±0.00ms ? B/sec neq Float32 1.50 90.4±0.12µs ? B/sec 1.00 60.1±0.09µs ? B/sec 1.51 90.6±0.12µs ? B/sec 1.00 60.2±0.06µs ? B/sec neq scalar Float32 1.36 79.8±0.10µs ? B/sec 1.00 58.6±0.14µs ? B/sec 1.36 79.9±0.13µs ? B/sec 1.00 58.6±0.06µs ? B/sec nlike_utf8 scalar ends with 1.16 580.9±0.50µs ? B/sec 1.00 500.9±0.73µs ? B/sec 1.16 579.4±0.67µs ? B/sec 1.00 499.6±0.58µs ? B/sec or 1.00 1007.5±1.33ns ? B/sec 1.07 1078.1±3.76ns ? B/sec 1.11 1116.4±1.79ns ? B/sec 1.17 1174.4±2.85ns ? B/sec ``` Since the comparison kernels could achieve similar iteration speedup WITHOUT any kind of public TypedArray trait, that on its own isn't enough to justify creation of the API. I plan to experiment further to see if other places reliant on cross-type array iteration could benefit. The main questions in my mind are: * struct union, list, and dictionary arrays don't encode which type of value they contain into their rust type, so they don't fit into the API very well... is it generally useful without them? Is there a natural way to make it work for them? * does it enable simpler code elsewhere, or is it targeting too narrow of a use-case (string formatting? a re-usable array.map style utility function? chaining a series of functions/operators together before storing an item back into an arrow array?) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
