jecsand838 commented on code in PR #8349: URL: https://github.com/apache/arrow-rs/pull/8349#discussion_r2365872073
########## arrow-avro/src/reader/record.rs: ########## @@ -486,6 +734,70 @@ impl Decoder { Self::Decimal256(_, _, _, builder) => builder.append_value(i256::ZERO), Self::Enum(indices, _, _) => indices.push(0), Self::Duration(builder) => builder.append_null(), + Self::Union(fields, type_ids, offsets, encodings, encoding_counts, None) => { + let mut chosen = None; + for (i, ch) in encodings.iter().enumerate() { + if matches!(ch, Decoder::Null(_)) { + chosen = Some(i); + break; + } + } + let idx = chosen.unwrap_or(0); + let type_id = fields + .iter() + .nth(idx) + .map(|(type_id, _)| type_id) + .unwrap_or_else(|| i8::try_from(idx).unwrap_or(0)); Review Comment: > Actually, maybe I do understand! > > If NULL is one of the union branches, emit that directly > Otherwise, default to the first branch (whatever that is), and emit a NULL value for it > Is that correct? > > (but if so, then I don't understand why idx could ever be out of bounds?) That's correct. I'm working on cleaning this up now by making the invariants explicit at construction time and erroring if the union is empty or if `encodings.len() != fields.len()`. Then I'll remove the `i8::try_from(idx)` fallback and only compute `type_id` from `fields[idx]`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org