yyanyy commented on pull request #2275:
URL: https://github.com/apache/iceberg/pull/2275#issuecomment-797852639


   > I have some questions:
   > When we switch from v1 format to v2, and a new metadata file is written 
for an existing table, what schemas are written to the `schemas` list? And in 
the `snapshot-log`, what `schema-id` is written for the previous snapshots? (Is 
it not written, i.e., is null? or is it 0?)
   > In general, if we see a schema id of 0, does that ever represent a 
specific schema, or does that always represent some undetermined schema? Let me 
elaborate: (1) Will we ever see a `schema-id` of 0 in a metadata file and if 
so, does that refer to a unique schema? (2) In code, if we have an instance of 
a schema and its schemaId is 0, what are the semantics of that schemaId?
   
   Thank you for the review, and sorry for the delay responding!
   
   I think this change applies to v1 tables as well. When the engine starts to 
use a release with this change, the new `schemas` list will be written with the 
current schema and 0 as its default schema id. And in `snapshot-log`, previous 
snapshots will have null schema-id since they were not available when they were 
written. 
   
   0 is a valid schema-id and it will refer to a unique schema in metadata 
file; if there's no schema evolution after the table starts to write `schemas`, 
0 will be assigned to the current schema. And in the code, since we only care 
about id during the interaction with table metadata, and throughout the process 
when schema class is used as various classes for doing projection etc, schemaId 
will always be 0, and that is just a default value and shouldn't be used. #2096 
has some conversation around this, and this behavior is mentioned in [schema 
class](https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/Schema.java#L44-L45).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to