firasomrane opened a new issue, #14077: URL: https://github.com/apache/iceberg/issues/14077
### Feature Request / Improvement ## Description: - Problem: Today CommitReport exposes overall commit [total-duration](https://github.com/apache/iceberg/blob/4f06b0911e870ad4ef329e77d68ae5ebe9135961/core/src/main/java/org/apache/iceberg/metrics/CommitMetrics.java#L28) and counters, but there’s no visibility into the time spent constructing on-disk metadata files as part of the new snapshot, which is often a significant portion of commit latency that I am interested in monitoring to capture problems with big metadata files Specifically: - Time to build/write the manifest-list files. - Time to serialize and write the new metadata.json. ### Ask: Add optional timers to CommitReport metrics to break down commit time by: - manifest-list-build-duration - metadata-json-write-duration ### Proposal (high level): - Add two optional timers to [CommitMetrics](https://github.com/apache/iceberg/blob/4f06b0911e870ad4ef329e77d68ae5ebe9135961/core/src/main/java/org/apache/iceberg/metrics/CommitMetrics.java#L27) and [CommitMetricsResult](https://github.com/apache/iceberg/blob/4f06b0911e870ad4ef329e77d68ae5ebe9135961/core/src/main/java/org/apache/iceberg/metrics/CommitMetricsResult.java#L61): - manifest-list-build-duration (nanoseconds) - metadata-json-write-duration (nanoseconds) Instrument the commit path to measure: - Manifest list build time [inside SnapshotProducer.apply()](https://github.com/apache/iceberg/blob/be577eeac631d77243beb57409e476bf197f79d7/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L265-L286). - Table metadata write time inside `TableOperations.commit(...)` implementations (Hadoop, JDBC, REST). - Include these timers in CommitReport → metrics JSON. Backward compatible since metrics are a map. ## Implementation details ### Manifest List Build time: where to add the timer. ``` public Snapshot apply() { refresh(); Snapshot parentSnapshot = SnapshotUtil.latestSnapshot(base, targetBranch); validate(base, parentSnapshot); List<ManifestFile> manifests = apply(base, parentSnapshot); OutputFile manifestList = manifestListPath(); // Start timer for manifest list build time ManifestListWriter writer = ManifestLists.write( ops.current().formatVersion(), manifestList, snapshotId(), parentSnapshotId, sequenceNumber, base.nextRowId()); try (writer) { manifestLists.add(manifestList.location()); ManifestFile[] manifestFiles = new ManifestFile[manifests.size()]; Tasks.range(manifestFiles.length) .executeWith(workerPool()) .run(index -> manifestFiles[index] = manifestsWithMetadata.get(manifests.get(index))); writer.addAll(Arrays.asList(manifestFiles)); // End timer for manifest list build time } catch (IOException e) { throw new RuntimeIOException(e, "Failed to write manifest list file"); } } ``` ### Metadata file Build time: where to add the timer. - inside `TableOperations.commit(...)` implementations: - Hadoop, inside [`HadoopTableOperations.java`(]https://github.com/apache/iceberg/blob/7f14032be8c0538bfa59aba9951ec8a6001035e3/core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java#L149-L163) - REST inside [`RESTTableOperations.java`](https://github.com/apache/iceberg/blob/a2b8008da7bc26e03248a35eeee60d1cc7e8499d/core/src/main/java/org/apache/iceberg/rest/RESTTableOperations.java#L116-151) ### Query engine Spark ### Willingness to contribute - [ ] I can contribute this improvement/feature independently - [ ] I would be willing to contribute this improvement/feature with guidance from the Iceberg community - [ ] I cannot contribute this improvement/feature at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
