[GitHub] [incubator-iceberg] rdblue commented on issue #207: Add external schema mappings for files written with name-based schemas #40

GitBox Fri, 11 Oct 2019 14:00:56 -0700

rdblue commented on issue #207: Add external schema mappings for files written 
with name-based schemas #40
URL: https://github.com/apache/incubator-iceberg/pull/207#issuecomment-541222346
 
 
   > If that is the case, then we are effectively changing the projected/read 
schema . . . Won't this affect Spark, Presto etc, since they are expecting the 
locations column?
   
   Yes, that's what I'm suggesting. This won't affect Spark and Presto because 
the Iceberg schema of the data will still match. The name that we use to 
project data in Avro aren't actually used by Spark. Spark knows that id is at 
position 0 and location is going to be at position 1. The schema that we use to 
configure the Avro decoder doesn't need to be the same as the one we use for 
records.
   
   Using a different schema is how we currently handle projection. Say your 
file has one column, 2: "a", and the current Iceberg schema is 1: "x" and 2: 
"y". The schema that we use to read is actually "x" and "y" (alias "a"). And 
that schema will be different for every file schema. The important part is that 
the positions are correct, not the names or concrete schema.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-iceberg] rdblue commented on issue #207: Add external schema mappings for files written with name-based schemas #40

Reply via email to