meethngala commented on code in PR #3663:
URL: https://github.com/apache/gobblin/pull/3663#discussion_r1144097168


##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergRegisterStep.java:
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.data.management.copy.iceberg;
+
+import java.io.IOException;
+
+import org.apache.iceberg.TableMetadata;
+
+import lombok.AllArgsConstructor;
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.commit.CommitStep;
+
+/**
+ * {@link CommitStep} to perform Iceberg registration.
+ */
+
+@Slf4j
+@AllArgsConstructor
+public class IcebergRegisterStep implements CommitStep {
+
+  private final IcebergTable srcIcebergTable;
+  private final IcebergTable existingTargetIcebergTable;
+
+  @Override
+  public boolean isCompleted() throws IOException {
+    return false;
+  }
+
+  @Override
+  public void execute() throws IOException {
+    TableMetadata targetMetadata = null;
+    try {
+      targetMetadata = this.existingTargetIcebergTable.accessTableMetadata();
+    } catch (IcebergTable.TableNotFoundException tnfe) {
+      log.warn("Target TableMetadata doesn't exist because : {}" , tnfe);
+    }
+    
this.srcIcebergTable.registerIcebergTable(this.srcIcebergTable.accessTableMetadata(),
 targetMetadata);

Review Comment:
   that totally makes sense! I have changed it to 
`existingDestinationIcebergTable` in my latest commit since its the destination 
side table that we need to use while registering. Somehow, this was also 
wrongly replaced while refactoring to define src and dst



##########
gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetTest.java:
##########
@@ -98,6 +101,10 @@ public class IcebergDatasetTest {
       new MockIcebergTable.SnapshotPaths(Optional.empty(), 
MANIFEST_LIST_PATH_1, Arrays.asList(
           new IcebergSnapshotInfo.ManifestFileInfo(MANIFEST_PATH_1,
               Arrays.asList(MANIFEST_DATA_PATH_1A, MANIFEST_DATA_PATH_1B))));
+  private static final MockIcebergTable.SnapshotPaths SNAPSHOT_PATHS_2 =
+      new MockIcebergTable.SnapshotPaths(Optional.empty(), Strings.EMPTY, 
Arrays.asList(
+          new IcebergSnapshotInfo.ManifestFileInfo(Strings.EMPTY,
+              Arrays.asList(Strings.EMPTY))));

Review Comment:
   I have got rid of it altogether since now I am creating the destination side 
table with already existing snapshot i.e `SNAPSHOT_PATHS_1`



##########
gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetTest.java:
##########
@@ -382,23 +402,36 @@ private static void 
verifyCopyEntities(Collection<CopyEntity> copyEntities, List
     List<String> actual = new ArrayList<>();
     for (CopyEntity copyEntity : copyEntities) {
       String json = copyEntity.toString();
-      String filepath = 
CopyEntityDeserializer.getFilePathAsStringFromJson(json);
-      actual.add(filepath);
+      if (isCopyableFile(json)) {
+        String filepath = 
CopyEntityDeserializer.getFilePathAsStringFromJson(json);
+        actual.add(filepath);
+      }
     }
     Assert.assertEquals(actual.size(), expected.size(), "Set" + 
actual.toString() + " vs Set" + expected.toString());
     Assert.assertEqualsNoOrder(actual.toArray(), expected.toArray());
   }
 
+  private static boolean isCopyableFile(String json) {

Review Comment:
   That's true and yes I hadn't added the verification step for commit. Thanks 
for pointing that out. I have added a verification in my latest commit!



##########
gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetTest.java:
##########
@@ -253,15 +263,19 @@ public void 
testGenerateCopyEntitiesMultiSnapshotWhenDestEmpty() throws IOExcept
     FileSystem sourceFs = sourceBuilder.build();
 
     IcebergTable icebergTable = 
MockIcebergTable.withSnapshots(Arrays.asList(SNAPSHOT_PATHS_1, 
SNAPSHOT_PATHS_0));
+    IcebergTable targetTable = 
MockIcebergTable.withSnapshots(Arrays.asList(SNAPSHOT_PATHS_2));
     IcebergDataset icebergDataset =
-        new TrickIcebergDataset(testDbName, testTblName, icebergTable, new 
Properties(), sourceFs);
+        new TrickIcebergDataset(testDbName, testTblName, icebergTable, 
targetTable, new Properties(), sourceFs);
 
     MockFileSystemBuilder destBuilder = new MockFileSystemBuilder(DEST_FS_URI);
     FileSystem destFs = destBuilder.build();
 
     CopyConfiguration copyConfiguration =
         CopyConfiguration.builder(destFs, 
copyConfigProperties).preserve(PreserveAttributes.fromMnemonicString(""))
             .copyContext(new CopyContext()).build();
+    try (MockedConstruction<PostPublishStep> mockedPostPublishStep = 
mockConstruction(PostPublishStep.class)) {
+      PostPublishStep step = new 
PostPublishStep(icebergDataset.getFileSetId(), Maps.newHashMap(), new 
IcebergRegisterStep(icebergTable, targetTable), 0);

Review Comment:
   I thought the constructor needed to be mocked since I was running into 
issues while testing for PostPublishStep not being able to instantiate. 
Debugging further helped me understand that this step can be avoided and all we 
need is to use `mockito-inline` instead of `mockito-core` and then it resolved 
the issue for me



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to