jorgecarleitao opened a new pull request #8842: URL: https://github.com/apache/arrow/pull/8842
This PR creates a new struct `BooleanArray`, that replaces `PrimitiveArray<BooleanType>`, so that we do not have to consider the differences between being bit-packed and non-bit packed. This difference is causing a performance degradation described on ARROW-10453 and #8837 . This usage of different logic is already observed in most of our kernels, as the code for byte-width and bit-packed is almost always different, due to how offsets are computed. With this PR, that offset computation no longer depends on bit-packed vs non-bit-packed. Micro benchmarks (worse to best, statistically insignificant ignored): | benchmark | variation | |-------------- | -------------- | | min nulls 512 | 33.7 | | record_batches_to_csv | 23.1 | | array_string_from_vec 256 | 5.6 | | array_string_from_vec 512 | 5.2 | | take bool nulls 512 | 4.9 | | cast int32 to int64 512 | 2.5 | | equal_512 | 2.3 | | filter u8 very low selectivity | 2.2 | | array_slice 512 | 2.1 | | take bool nulls 1024 | 2.0 | | cast int64 to int32 512 | 1.6 | | min 512 | 1.6 | | take i32 512 | 1.1 | | add 512 | 1.1 | | array_slice 2048 | 1.0 | | length | 1.0 | | filter u8 low selectivity | 0.9 | | filter u8 high selectivity | 0.9 | | array_string_from_vec 128 | 0.9 | | cast int32 to float64 512 | 0.9 | | cast timestamp_ms to i64 512 | 0.8 | | take str null indices 512 | 0.6 | | sum 512 | 0.4 | | filter context u8 very low selectivity | -0.7 | | take i32 1024 | -0.9 | | filter context f32 very low selectivity | -0.9 | | cast float64 to float32 512 | -1.0 | | equal_nulls_512 | -1.0 | | cast time32s to time32ms 512 | -1.1 | | sort 2^12 | -1.2 | | struct_array_from_vec 128 | -1.4 | | array_from_vec 256 | -1.4 | | array_from_vec 128 | -1.5 | | filter context u8 high selectivity | -1.6 | | limit 512, 512 | -1.7 | | equal_string_nulls_512 | -1.8 | | take i32 nulls 1024 | -1.8 | | struct_array_from_vec 512 | -1.9 | | filter context f32 high selectivity | -2.0 | | cast timestamp_ms to timestamp_ns 512 | -2.2 | | take i32 nulls 512 | -2.3 | | buffer_bit_ops or | -2.4 | | array_from_vec 512 | -2.6 | | cast float64 to uint64 512 | -2.7 | | take str 512 | -2.8 | | min nulls string 512 | -3.1 | | cast int32 to int32 512 | -3.3 | | array_slice 128 | -3.3 | | filter context u8 w NULLs very low selectivity | -3.3 | | buffer_bit_ops and | -3.4 | | struct_array_from_vec 256 | -4.2 | | cast int32 to uint32 512 | -4.5 | | multiply 512 | -5.2 | | equal_string_512 | -5.5 | | take str null values null indices 1024 | -6.8 | | sum nulls 512 | -13.3 | | add_nulls_512 | -17.6 | | like_utf8 scalar contains | -17.8 | | nlike_utf8 scalar contains | -17.9 | | nlike_utf8 scalar complex | -24.6 | | like_utf8 scalar complex | -25.2 | | cast time64ns to time32s 512 | -42.7 | | cast date64 to date32 512 | -49.1 | | cast date32 to date64 512 | -50.7 | | nlike_utf8 scalar starts with | -51.1 | | nlike_utf8 scalar ends with | -55.1 | | like_utf8 scalar ends with | -55.5 | | like_utf8 scalar starts with | -56.3 | | nlike_utf8 scalar equals | -67.8 | | like_utf8 scalar equals | -74.2 | | eq Float32 | -75.7 | | gt_eq Float32 | -76.1 | | lt_eq Float32 | -76.5 | | not | -77.1 | | and | -78.6 | | or | -78.7 | | lt_eq scalar Float32 | -79.4 | | eq scalar Float32 | -82.1 | | neq Float32 | -82.1 | | lt scalar Float32 | -82.1 | | lt Float32 | -82.3 | | gt Float32 | -82.4 | | gt_eq scalar Float32 | -82.4 | | neq scalar Float32 | -82.6 | | gt scalar Float32 | -84.7 | ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
