rdblue commented on issue #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#issuecomment-541222346 > If that is the case, then we are effectively changing the projected/read schema . . . Won't this affect Spark, Presto etc, since they are expecting the locations column? Yes, that's what I'm suggesting. This won't affect Spark and Presto because the Iceberg schema of the data will still match. The name that we use to project data in Avro aren't actually used by Spark. Spark knows that id is at position 0 and location is going to be at position 1. The schema that we use to configure the Avro decoder doesn't need to be the same as the one we use for records. Using a different schema is how we currently handle projection. Say your file has one column, 2: "a", and the current Iceberg schema is 1: "x" and 2: "y". The schema that we use to read is actually "x" and "y" (alias "a"). And that schema will be different for every file schema. The important part is that the positions are correct, not the names or concrete schema.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
