shenodaguirguis commented on pull request #2496: URL: https://github.com/apache/iceberg/pull/2496#issuecomment-828701983
> I agree w/ @RussellSpitzer on complex types. How would a deeply nested structure look w/ default types? > I can definitely see the value in default values but I am having a hard time figuring out all the downstream effects: > how does Spark handle these, how do they work in the Arrow based vectorised readers etc. Some more context > on the effects on other systems would be useful for me to understand this change better. thanks for the question @rymurr. I see your concern. In this change, we are adding default values as optional, such that different readers don't have to worry handling it. I plan to handle default values in Spark's avro, orc and parquet readers. We can tackle them one by one. I don't foresee any potential complications except may be for the ser/deser of complex types. For spark avro reader, which I will start with, here is how to handle default values: 1. Converting Avro Schema to Iceberg Types takes place in AvroSchemaUtil::convert(Schema) which takes in an avro.Schema and uses the SchemaToType avro schema visitor to perform the conversion. To copy over the default value, we need to modify SchemaToType::record() method to handle the case when the field has default value and use the new NestedFiled API for defaultValue. 2. BuildAvroProjection.record() changes: this is the mapping of the read schema code path. Here we need to add handling for cases when the field has a default value and construct the read avro schema correctly with default values. 3. Reader side changes: since we constructed the read avro schema correctly with default values in BuildAvroProjection.record() in the previous step, avro libraries will handle filling any field with default value, if the field is not manifested -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
