funcheetah commented on PR #4242: URL: https://github.com/apache/iceberg/pull/4242#issuecomment-1104350142
> Thanks @funcheetah for the work here! Being able to represent unions as a struct really helps seamlessly migrate the Hive tables in our ecosystem to Iceberg without having to restate all historical data. > > I think this PR is a good starting step towards the goal. However for ease of reviewing, can we split the PR into two? Avro and ORC. I think we can work on finishing up the Avro side before moving on to ORC. I have made a preliminary pass on the Avro changes below. > > There are a couple of TODOs mentioned in the PR description. But I think there may be more things required for completeness and consistency. > > 1. Support in non-Spark environments (e.g. iceberg-data, flink, hive, etc.) > 2. Support for schema pruning within a complex union > > These can be added in gradually, but they should be noted in the PR. And we should create separate issues for these. > > @RussellSpitzer @rdblue Should we create a new Project in Github to track this effort? There will be multiple PRs required to complete this work. Thanks a lot for the reviewing @shardulm94 ! We can focus on reviewing for Avro in this PR and open another PR for ORC. Regarding tracking of followup PRs, what is the best way for us to do so? Creating a project? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
