[GitHub] [iceberg] rdblue commented on pull request #1177: Add schema upsert API for adding new fields and updating type (where applicable) using an input schema

GitBox Mon, 03 Aug 2020 09:47:40 -0700


rdblue commented on pull request #1177:
URL: https://github.com/apache/iceberg/pull/1177#issuecomment-668126179



   @fbocse, @rdsr, while I was looking at this, I prototyped a way to avoid a 
couple of the issues that I pointed out with this. I've posted that [union 
prototype](https://gist.github.com/rdblue/1d39763be41b59b8637d50162f7fe2ea) as 
a gist.
   
   The gist includes a `SchemaWithPartnerVisitor` pattern that is like the 
`SchemaWithTypeVisitor`. It handles the logic of traversing two schemas at the 
same time and passing "partner" objects along with the schema that is being 
traversed. The partner objects are accessed using a small set of methods, like 
`fieldPartner` to access a field's partner from a struct's partner.
   
   The gist also includes an implementation, `UnionByNameVisitor`. That visitor 
extends `SchemaWithPartnerVisitor`, where the partner is the equivalent field 
ID in the "existing" schema. By passing the existing schema's IDs, we can avoid 
keeping track of the ancestor names in the visitor. Instead, the prototype 
looks up the name in the existing schema from the ID when it needs to call 
`addColumn` with a parent name. That avoids all of the string logic and the 
need for `beforeField` callbacks. It also implements my suggestion to return a 
boolean to indicate whether a type is missing.
   
   See what you think about that prototype. I think it's good to have better 
separation between the traversal logic and the union logic, it is nice to not 
need to maintain the ancestor list, and it is a bit cleaner when detecting what 
changes need to be applied.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on pull request #1177: Add schema upsert API for adding new fields and updating type (where applicable) using an input schema

Reply via email to