shenodaguirguis commented on pull request #2496:
URL: https://github.com/apache/iceberg/pull/2496#issuecomment-828701983


   > I agree w/ @RussellSpitzer on complex types. How would a deeply nested 
structure look w/ default types?
   > I can definitely see the value in default values but I am having a hard 
time figuring out all the downstream effects: 
   > how does Spark handle these, how do they work in the Arrow based 
vectorised readers etc. Some more context 
   > on the effects on other systems would be useful for me to understand this 
change better.
   
   thanks for the question @rymurr. I see your concern. In this change, we are 
adding default values as optional, such that different readers don't have to 
worry handling it. I plan to handle default values in Spark's avro, orc and 
parquet readers. We can tackle them one by one. I don't foresee any potential 
complications except may be for the ser/deser of complex types.
   
   For spark avro reader, which I will start with, here is how to handle 
default values:
   1. Converting Avro Schema to Iceberg Types takes place in 
AvroSchemaUtil::convert(Schema) which takes in an avro.Schema and uses the 
SchemaToType avro schema visitor to perform the conversion. To copy over the 
default value, we need to modify SchemaToType::record() method to handle the 
case when the field has default value and use the new NestedFiled API for 
defaultValue.
   2. BuildAvroProjection.record() changes: this is the mapping of the read 
schema code path. Here we need to add handling for cases when the field has a 
default value and construct the read avro schema correctly with default values.
   3. Reader side changes: since we constructed the read avro schema correctly 
with default values in BuildAvroProjection.record() in the previous step, avro 
libraries will handle filling any field with default value, if the field is not 
manifested
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to