asubiotto commented on PR #9558: URL: https://github.com/apache/arrow-rs/pull/9558#issuecomment-4252138274
Sorry for dropping the ball on this! I think this is going in the right direction but when I pulled this in to try it out I realized that it doesn't work very well when interleaving listviews with a high number of shraed elements (i.e. offset/size windows are overlapping). I think we can get the best of both worlds by computing a heuristic: i.e. how many values are referenced vs how many values are in the backing array to figure out if we want to do per-row copies as this pr does or just a full concat of the backing slice which preserves overlapping encodings and can be much cheaper in the end. Here is a commit that implements that on top of this PR with a benchmark: https://github.com/polarsignals/arrow-rs/commit/7cb688012a94b098abc2659237a8900b611846c3 There is a slight perf hit vs your branch to compute the heuristic (summing referenced sizes), but I think it's worth it in the grand scheme of things: ``` interleave list_view<i64>(0.1,0.1,20) 100 [0..100, 100..230, 450..1000] time: [2.8553 µs 2.8661 µs 2.8777 µs] change: [−39.782% −39.429% −39.058%] (p = 0.00 < 0.05) interleave list_view<i64>(0.1,0.1,20) 400 [0..100, 100..230, 450..1000] time: [8.2066 µs 8.2440 µs 8.2838 µs] change: [−41.803% −41.460% −41.123%] (p = 0.00 < 0.05) interleave list_view<i64>(0.1,0.1,20) 1024 [0..100, 100..230, 450..1000] time: [22.291 µs 22.424 µs 22.580 µs] change: [−39.377% −38.883% −38.328%] (p = 0.00 < 0.05) interleave list_view<i64>(0.1,0.1,20) 1024 [0..100, 100..230, 450..1000, 0..1000] time: [21.744 µs 21.868 µs 22.003 µs] change: [−40.397% −39.966% −39.515%] (p = 0.00 < 0.05) interleave list_view<i64>(0.0,0.0,20) 100 [0..100, 100..230, 450..1000] time: [1.7642 µs 1.7770 µs 1.7937 µs] change: [−36.120% −35.680% −35.219%] (p = 0.00 < 0.05) interleave list_view<i64>(0.0,0.0,20) 400 [0..100, 100..230, 450..1000] time: [5.1748 µs 5.2052 µs 5.2392 µs] change: [−29.000% −28.500% −28.000%] (p = 0.00 < 0.05) interleave list_view<i64>(0.0,0.0,20) 1024 [0..100, 100..230, 450..1000] time: [12.528 µs 12.631 µs 12.741 µs] change: [−29.293% −28.511% −27.801%] (p = 0.00 < 0.05) interleave list_view<i64>(0.0,0.0,20) 1024 [0..100, 100..230, 450..1000, 0..1000] time: [13.009 µs 13.098 µs 13.192 µs] change: [−26.841% −26.193% −25.550%] (p = 0.00 < 0.05) interleave list_view_overlapping<i64>(80x,20) 100 [0..100, 100..230, 450..1000] time: [1.8046 µs 1.8271 µs 1.8547 µs] change: [−44.472% −43.715% −42.935%] (p = 0.00 < 0.05) interleave list_view_overlapping<i64>(80x,20) 400 [0..100, 100..230, 450..1000] time: [3.3896 µs 3.4283 µs 3.4689 µs] change: [−66.387% −66.073% −65.773%] (p = 0.00 < 0.05) interleave list_view_overlapping<i64>(80x,20) 1024 [0..100, 100..230, 450..1000] time: [5.7748 µs 5.8133 µs 5.8482 µs] change: [−72.104% −71.879% −71.641%] (p = 0.00 < 0.05) interleave list_view_overlapping<i64>(80x,20) 1024 [0..100, 100..230, 450..1000, 0..1000] time: [6.2896 µs 6.3539 µs 6.4243 µs] change: [−69.684% −69.377% −69.083%] (p = 0.00 < 0.05) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
