jecsand838 commented on code in PR #8274:
URL: https://github.com/apache/arrow-rs/pull/8274#discussion_r2322622541
##########
arrow-avro/src/schema.rs:
##########
@@ -370,6 +371,49 @@ impl AvroSchema {
pub fn fingerprint(&self) -> Result<Fingerprint, ArrowError> {
generate_fingerprint_rabin(&self.schema()?)
}
+
+ /// Build Avro JSON from an Arrow [`ArrowSchema`], applying the given
null‑union order.
+ ///
+ /// If the input Arrow schema already contains Avro JSON in
+ /// [`SCHEMA_METADATA_KEY`], that JSON is returned verbatim to preserve
+ /// the exact header encoding alignment; otherwise, a new JSON is
generated
+ /// honoring `null_union_order` at **all nullable sites**.
+ pub fn from_arrow_with_options(
+ schema: &ArrowSchema,
+ null_union_order: Option<Nullability>,
+ ) -> Result<AvroSchema, ArrowError> {
+ if let Some(json) = schema.metadata.get(SCHEMA_METADATA_KEY) {
+ return Ok(AvroSchema::new(json.clone()));
+ }
+ let order = null_union_order.unwrap_or(Nullability::NullFirst);
+ let mut name_gen = NameGenerator::default();
+ let fields_json = schema
+ .fields()
+ .iter()
+ .map(|f| arrow_field_to_avro_with_order(f, &mut name_gen, order))
+ .collect::<Result<Vec<_>, _>>()?;
+ let record_name = schema
+ .metadata
+ .get(AVRO_NAME_METADATA_KEY)
+ .map_or("topLevelRecord", |s| s.as_str());
Review Comment:
> aside: Is this a well-known default name? Or just an arbitrary naming
choice by this package?
While not an Avro‑spec default, `topLevelRecord` is used as a de‑facto
default because several popular tools (notably Spark/Databricks) default to the
same name when they synthesize an Avro schema from a struct/row.
> And does it actually matter in practice? (I guess if it mattered, the
schema metadata would say so)?
Avro requires that a record have a name to be valid. Also because the record
name participates in canonical form parsing, changing a record's name will
change it's fingerprint.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]