sweetpythoncode commented on issue #6306:
URL: https://github.com/apache/iceberg/issues/6306#issuecomment-1471838942
@szehon-ho @dramaticlly Thanks for your reply! I run full data clear due to
another issue, if you run `add_files` with `check_duplicate_files` two or more
times it generates incorrect meta information for `record_count` and
`file_size` for e.x.:
`CALL iceberg_catalog.system.add_files(
table => 'test.test_name',
source_table => '`orc`.`s3://bucket/data/`',
check_duplicate_files => false
)
returns in metadata for specific file(i check metadata through Trino)
record_count -> 1, file_size -> 500
i run once again due to new partitions added in the path
CALL iceberg_catalog.system.add_files(
table => 'test.test_name',
source_table => '`orc`.`s3://bucket/data/`',
check_duplicate_files => false
)
and now its returns in metadata for specific file(which is already
registered in first step)
record_count -> 2, file_size -> 1000
so it will sum metadata information same files each time when i run
**add_files** with **check_duplicate_files**, except if they are new in the
path. If i run:
delete from test.test_name where 1=1;
Old files will be matched as new and that issue is fixed by this workaround
`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]