yyanyy commented on pull request #1963:
URL: https://github.com/apache/iceberg/pull/1963#issuecomment-785617395


   Thank you @rdblue for reviewing this PR again! I think I have addressed all 
the feedback. Since the visitor change is non-trivial and the original change 
didn't contain tests, in order to ensure the new code with before/afterField 
pattern doesn't impact normal avro writes, I added all the necessary tests to 
this PR (which means that #1935 will be done if the current changes are 
merged), but please feel free to suggest removing some classes for an easier 
review.
   
   One thing I'd like to mention is a behavior difference among different file 
types in handling optional struct field with required and optional sub fields. 
I think it might be fine for now since I think we don't use those statistics, 
but just want to capture the current status here:
   
   for example, if the schema is `someField, optional struct<required int1, 
optional int2>`, and there are 50 records with this schema in total, within 
which 6 records have null struct, and 3 records have struct but null int2, then 
each file format will produce the following count info:
   
   |         | valueCount | nullCount |
   |---------|------------|-----------|
   | Parquet | int1: 50, int2: 50       | int1:  6,  int2: 9    |
   | Orc     |int1: 44, int2: 50            | int1: 0, int2: 9          |
   | Avro    |int1: 44, int2: 44            |int1: 0, int2: 3          |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to