c-thiel commented on code in PR #2188:
URL: https://github.com/apache/iceberg-rust/pull/2188#discussion_r3330065906
##########
crates/iceberg/src/writer/file_writer/parquet_writer.rs:
##########
@@ -191,20 +207,11 @@ impl SchemaVisitor for IndexByParquetPathName {
}
fn primitive(&mut self, _p: &PrimitiveType) -> Result<Self::T> {
- let full_name = self.field_names.iter().map(String::as_str).join(".");
- let field_id = self.field_id;
- if let Some(existing_field_id) =
self.name_to_id.get(full_name.as_str()) {
- return Err(Error::new(
- ErrorKind::DataInvalid,
- format!(
- "Invalid schema: multiple fields for name {full_name}:
{field_id} and {existing_field_id}"
- ),
- ));
- } else {
- self.name_to_id.insert(full_name, field_id);
- }
+ self.insert_current_path()
+ }
- Ok(())
+ fn variant(&mut self, _v: &VariantType) -> Result<Self::T> {
Review Comment:
Regarding 1)
iceberg-rust writes via `AsyncArrowWriter`, which derives the Parquet schema
from the Arrow schema. In parquet 58.1.0, that path only emits the VARIANT
annotation when the field carries the `parquet_variant_compute::VariantType`
extension type and `variant_experimental` is enabled (otherwise
`logical_type_for_struct` is a stub returning None). I couldn't find a public
per-field hook to inject the annotation onto a plain `Struct(Binary,Binary)`.
So the real cost is: enable variant_experimental + attach the extension type
to the field. Two risksthat I se:
- Turning on the feature may change how the reader decodes a
VARIANT-annotated group (native VariantArray instead of Struct{metadata,value})
— could break the current read path that expects the struct.
- New experimental dep surface.
Not sure how we should proceed. I think this maybe should be a separate
issue?
I created this for now:
https://github.com/apache/iceberg-rust/issues/2546
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]