Shane-Yu commented on issue #5404:
URL: https://github.com/apache/iceberg/issues/5404#issuecomment-1212867591

   I also met this probelm in the same case.  It's not  "some delete files 
associated with the data file" casue this problem.  Add log in the  tail of 
https://github.com/apache/iceberg/blob/5a15efc070ab59eeda6343998aa065c0c9892c5c/core/src/main/java/org/apache/iceberg/DeleteFileIndex.java#L151
  to print the data file path, delete file path, lower and upper. And you can 
see the upper and lower filepath info is not complete filepath, but truncate 16 
bit.  This can lead to false positives when determining whether a data file 
references a deleted file.   From the source code 
https://github.com/apache/iceberg/blob/5a15efc070ab59eeda6343998aa065c0c9892c5c/core/src/main/java/org/apache/iceberg/MetricsConfig.java#L52
  you can see the DEFAULT_WRITE_METRICS_MODE_DEFAULT is truncate(16).  The 
upper and lower information of the filepath was intercepted when the data file 
was generated,  which lead to the misjudgment  when commit  in rewrite data. 
        To resolve this problem,  add a property  like this when create table.
   `  alter table iceberg_table set tblproperties (
     'write.etadata.metrics.default'='full'
    );`
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to