Hi,

We are in the process of converting Hive datasets to Iceberg datasets.

In this process, we noticed that each data-file entry in the manifest file
has a required record_count field.

Populating this accurately would require reading the footer/tail for
Parquet/ORC files. For AVRO files, it requires reading the block headers
for all blocks to determine the number of records in the AVRO file.

Is the record_count in the data-file entry expected to be accurate? or can
we estimate it based on size of the file and an estimation of a row size?

Thanks
Vivek

Reply via email to