szehon-ho edited a comment on issue #3263:
URL: https://github.com/apache/iceberg/issues/3263#issuecomment-939201068


   I was thinking to explicitly set it to -1 when constructing the DataFile in 
listAvroPartitions, but I am worried its setting wrong metadata (though it is 
explicitly set to -1 for Metrics.)
   
   Ref: TableMigrationUtil::listAvroPartitions
   
   ```
    Metrics metrics = new Metrics(-1L, null, null, null);
               String partitionKey = spec.fields().stream()
                   .map(PartitionField::name)
                   .map(name -> String.format("%s=%s", name, 
partitionPath.get(name)))
                   .collect(Collectors.joining("/"));
   
               return DataFiles.builder(spec)
                   .withPath(stat.getPath().toString())
                   .withFormat("avro")
                   .withFileSizeInBytes(stat.getLen())
                   .withMetrics(metrics)
                   .withPartitionPath(partitionKey)
                   .build();
   ```
   
   Doesn't seem like Avro file has any metadata to quickly get recordCount, I 
wonder if making a Spark job to read the number of rows is overkill?  Any 
thoughts @RussellSpitzer and @aokolnychyi ?  Or is the add_file for Avro case 
designed to be ok to not set the record count?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to