jecsand838 commented on code in PR #8254: URL: https://github.com/apache/arrow-rs/pull/8254#discussion_r2312110172
########## arrow-avro/src/reader/record.rs: ########## @@ -590,10 +590,23 @@ impl Decoder { ))); } } + // Extract the value field nullability from the schema + let is_value_nullable = match map_field.data_type() { + DataType::Struct(fields) => fields + .iter() + .find(|f| f.name() == "value") + .map(|f| f.is_nullable()) + .unwrap_or(false), + _ => true, // default to nullable + }; let entries_struct = StructArray::new( Fields::from(vec![ Arc::new(ArrowField::new("key", DataType::Utf8, false)), - Arc::new(ArrowField::new("value", val_arr.data_type().clone(), true)), + Arc::new(ArrowField::new( + "value", + val_arr.data_type().clone(), + is_value_nullable, Review Comment: I took a slightly different approach in #8220 that avoids the field scan and new Field allocation while preserving the existing schema metadata. It's slightly more performant, especially on smaller batches: ``` Map/100 time: [6.2237 µs 6.2594 µs 6.2899 µs] thrpt: [507.32 MiB/s 509.79 MiB/s 512.71 MiB/s] change: time: [−0.3038% +0.5799% +1.4930%] (p = 0.19 > 0.05) thrpt: [−1.4710% −0.5766% +0.3047%] No change in performance detected. Map/10000 time: [250.40 µs 253.75 µs 258.62 µs] thrpt: [1.2573 GiB/s 1.2814 GiB/s 1.2986 GiB/s] change: time: [−2.5344% −1.1670% +0.1631%] (p = 0.08 > 0.05) thrpt: [−0.1628% +1.1808% +2.6003%] No change in performance detected. Found 6 outliers among 25 measurements (24.00%) 6 (24.00%) low mild Map/1000000 time: [252.99 µs 255.93 µs 260.24 µs] thrpt: [130.21 GiB/s 132.40 GiB/s 133.94 GiB/s] change: time: [−1.9418% −0.4373% +1.0680%] (p = 0.60 > 0.05) thrpt: [−1.0568% +0.4393% +1.9803%] No change in performance detected. ``` vs ``` Map/100 time: [6.4487 µs 6.4584 µs 6.4687 µs] thrpt: [493.30 MiB/s 494.09 MiB/s 494.83 MiB/s] change: time: [+4.2911% +5.0113% +5.7191%] (p = 0.00 < 0.05) thrpt: [−5.4097% −4.7721% −4.1145%] Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high mild Map/10000 time: [258.90 µs 260.78 µs 263.17 µs] thrpt: [1.2356 GiB/s 1.2469 GiB/s 1.2559 GiB/s] change: time: [−0.6514% +0.8318% +2.5189%] (p = 0.34 > 0.05) thrpt: [−2.4570% −0.8249% +0.6557%] No change in performance detected. Found 1 outliers among 25 measurements (4.00%) 1 (4.00%) high severe Map/1000000 time: [265.48 µs 268.25 µs 270.56 µs] thrpt: [125.24 GiB/s 126.33 GiB/s 127.64 GiB/s] change: time: [+3.0081% +4.1202% +5.2124%] (p = 0.00 < 0.05) thrpt: [−4.9542% −3.9572% −2.9203%] Performance has regressed. Found 1 outliers among 10 measurements (10.00%) ``` If you wanted to check it out: https://github.com/apache/arrow-rs/blob/ebf402915511308201ef5fbb92368d696ee50ff5/arrow-avro/src/reader/record.rs#L685 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org