JingsongLi commented on PR #276:
URL: https://github.com/apache/paimon-rust/pull/276#issuecomment-4301567204

   @luoyuxia Thanks for the contribution!
   
   Left comments from Claude 3.7:
     1. from_avro_bytes collects all records into Value::Array then 
deserializes the whole array (objects_file.rs, manifest.rs, index_manifest.rs). 
This creates an intermediate Vec<Value> plus a wrapping Value::Array, then 
from_value walks it again. For large files this doubles memory. Consider 
deserializing each record individually:
     reader
         .map(|r| r.map_err(Error::from).and_then(|v| 
from_value(&v).map_err(Error::from)))
         .collect::<Result<Vec<T>>>()
     1. This avoids the intermediate collection entirely.
     2. get_field_at formatting — the match arm body has inconsistent brace 
style that may trip rustfmt:
     Value::Record(fields) => if let Some(i) = idx {
         fields.get(i).map(|(_, v)| v)
     } else {
         fields.iter().find(|(n, _)| n == name).map(|(_, v)| v)
     }
     .and_then(unwrap_value),
     2. The .and_then(unwrap_value) is chained on the if let expression, which 
is valid but hard to read. Wrapping the if let in braces or extracting to a 
local would be clearer.
     3. field_index returns None on empty slices — this is fine, but 
get_field_at with idx = None falls back to linear search. If the field 
genuinely doesn't exist in the schema, every row will do a linear scan and find 
nothing. Not a bug, but worth noting the fallback is O(n·m) in the degenerate 
case (n rows, m fields).
     4. build_map_column semantic change — the old code used get_field_raw (no 
union unwrapping) for maps, because AvroValue::Object was ambiguous between 
unions and maps. The new code uses get_field_at which calls unwrap_value and 
thus unwraps unions. With apache_avro::types::Value, Map and Union are distinct 
variants, so this is actually correct now. Just
     flagging it as a subtle behavioral change that's worth a test if one 
doesn't exist.
     5. DataUnexpected error variant (error.rs) — the message field is always 
set to "" in the From<apache_avro::Error> impl. Either drop the message field 
from this variant (use only source), or populate it with something useful like 
"avro error". An empty string in display("Paimon hitting unexpected avro error 
{}: {:?}", message, source) looks odd in logs.
     6. apache-avro 0.21 — HashMap iteration order in Value::Map — Avro maps 
use HashMap internally, so iteration order in build_map_column is 
non-deterministic. The old code also used HashMap, so this isn't a regression, 
but it's worth noting that map key/value ordering in the Arrow MapArray output 
is not guaranteed to be stable.
     7. Minor: index_manifest.rs test — the roundtrip test now manually unwraps 
Value::Union(_, inner). This is fine but somewhat fragile if the schema 
changes. Consider using from_value directly on the decoded value since 
from_value should handle union unwrapping.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to