yyanyy commented on pull request #1963: URL: https://github.com/apache/iceberg/pull/1963#issuecomment-785617395
Thank you @rdblue for reviewing this PR again! I think I have addressed all the feedback. Since the visitor change is non-trivial and the original change didn't contain tests, in order to ensure the new code with before/afterField pattern doesn't impact normal avro writes, I added all the necessary tests to this PR (which means that #1935 will be done if the current changes are merged), but please feel free to suggest removing some classes for an easier review. One thing I'd like to mention is a behavior difference among different file types in handling optional struct field with required and optional sub fields. I think it might be fine for now since I think we don't use those statistics, but just want to capture the current status here: for example, if the schema is `someField, optional struct<required int1, optional int2>`, and there are 50 records with this schema in total, within which 6 records have null struct, and 3 records have struct but null int2, then each file format will produce the following count info: | | valueCount | nullCount | |---------|------------|-----------| | Parquet | int1: 50, int2: 50 | int1: 6, int2: 9 | | Orc |int1: 44, int2: 50 | int1: 0, int2: 9 | | Avro |int1: 44, int2: 44 |int1: 0, int2: 3 | ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
