rdblue commented on issue #244: Detect new fields from Spark StructType compared to a reference Iceberg table and provide update schema commit URL: https://github.com/apache/incubator-iceberg/pull/244#issuecomment-509023025 Thanks for working on this, @fbocse. I think it is a good idea to have a way to detects changes and prepare an update. But I think we need to be careful about some of the choices made in this commit: * New features should be done in the core Java library so any client can use them, not just Spark * It looks like this only detects new fields. What about fields that are removed? Also, using a Spark schema for detection means that this can't use schema IDs, so it can't tell the difference between a rename and a drop followed by add. That's probably another reason to build change detection between Iceberg schemas. * Usually, we try to keep recursion logic and business logic separate by using the visitor pattern. I think that adding a visitor that traverses two Iceberg schemas at the same time, like the Parquet `TypeWithSchemaVisitor`, would help simplify the code and would be useful for other tasks.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
