anesterenok commented on issue #15236:
URL: https://github.com/apache/iceberg/issues/15236#issuecomment-3858837183

   So if I got you right, the only place in the Java code using the "default 
compression codec" property would be new table creation.  And all new tables 
are expected to have the "compression codec" property set, otherwise they are 
regarded as "old tables" and are appended to with the old GZIP codec.
   
   Well, it's ok-ish from Java implementation perspective (though it certainly 
has to be documented somewhere), but it is not so good from interoperability 
perspective.  Iceberg is an open format, after all.
   
   In my own case, we have a table created (and appended to) by another 
application based on pyiceberg.  But in pyiceberg implementation, a newly 
created table does not have any properties.  Still, the data appended to it is 
ZSTD (probably because pyiceberg team also thought the defaults to work that 
way).  However, when we append data to it with Java implementation, it is GZIP, 
and this was not really expected (hence the issue).
   
   I believe that some changes to the spec are in order.  If any table 
properties have become obligatory for the new tables, they must be explicitly 
listed, like other objects' required properties are.  And also there has to be 
a specified way of dealing with older tables that do not have them ("property 
evolution", if you like), e.g. another default value like GZIP in this case, or 
maybe even behind-the-scenes creation of these properties (set to the old 
default value). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to