[ https://issues.apache.org/jira/browse/GOBBLIN-2159?focusedWorklogId=939760&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-939760 ]
ASF GitHub Bot logged work on GOBBLIN-2159: ------------------------------------------- Author: ASF GitHub Bot Created on: 23/Oct/24 17:21 Start Date: 23/Oct/24 17:21 Worklog Time Spent: 10m Work Description: Blazer-007 commented on code in PR #4058: URL: https://github.com/apache/gobblin/pull/4058#discussion_r1813222492 ########## gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergTable.java: ########## @@ -237,31 +238,35 @@ protected void registerIcebergTable(TableMetadata srcMetadata, TableMetadata dst * @throws RuntimeException if error occurred while reading the manifest file */ public List<DataFile> getPartitionSpecificDataFiles(Predicate<StructLike> icebergPartitionFilterPredicate) - throws TableNotFoundException { + throws IOException { TableMetadata tableMetadata = accessTableMetadata(); Snapshot currentSnapshot = tableMetadata.currentSnapshot(); long currentSnapshotId = currentSnapshot.snapshotId(); List<DataFile> knownDataFiles = new ArrayList<>(); - log.info("~{}~ for snapshot '{}' - '{}' total known iceberg datafiles", tableId, currentSnapshotId, - knownDataFiles.size()); + GrowthMilestoneTracker growthMilestoneTracker = new GrowthMilestoneTracker(); //TODO: Add support for deleteManifests as well later // Currently supporting dataManifests only List<ManifestFile> dataManifestFiles = currentSnapshot.dataManifests(this.tableOps.io()); for (ManifestFile manifestFile : dataManifestFiles) { + if (growthMilestoneTracker.isAnotherMilestone(knownDataFiles.size())) { + log.info("~{}~ for snapshot '{}' - before manifest-file '{}' '{}' total known iceberg datafiles", tableId, + currentSnapshotId, + manifestFile.path(), + knownDataFiles.size() + ); + } Review Comment: Yes, seems a valid approach let me remove growthMileStonetracker from that function Issue Time Tracking ------------------- Worklog Id: (was: 939760) Time Spent: 12h (was: 11h 50m) > Support Partition Based Copy in Iceberg Distcp > ---------------------------------------------- > > Key: GOBBLIN-2159 > URL: https://issues.apache.org/jira/browse/GOBBLIN-2159 > Project: Apache Gobblin > Issue Type: Task > Reporter: Vivek Rai > Priority: Major > Time Spent: 12h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)