meethngala commented on code in PR #3663: URL: https://github.com/apache/gobblin/pull/3663#discussion_r1144097168
########## gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergRegisterStep.java: ########## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.gobblin.data.management.copy.iceberg; + +import java.io.IOException; + +import org.apache.iceberg.TableMetadata; + +import lombok.AllArgsConstructor; +import lombok.extern.slf4j.Slf4j; + +import org.apache.gobblin.commit.CommitStep; + +/** + * {@link CommitStep} to perform Iceberg registration. + */ + +@Slf4j +@AllArgsConstructor +public class IcebergRegisterStep implements CommitStep { + + private final IcebergTable srcIcebergTable; + private final IcebergTable existingTargetIcebergTable; + + @Override + public boolean isCompleted() throws IOException { + return false; + } + + @Override + public void execute() throws IOException { + TableMetadata targetMetadata = null; + try { + targetMetadata = this.existingTargetIcebergTable.accessTableMetadata(); + } catch (IcebergTable.TableNotFoundException tnfe) { + log.warn("Target TableMetadata doesn't exist because : {}" , tnfe); + } + this.srcIcebergTable.registerIcebergTable(this.srcIcebergTable.accessTableMetadata(), targetMetadata); Review Comment: that totally makes sense! I have changed it to `existingDestinationIcebergTable` in my latest commit since its the destination side table that we need to use while registering. Somehow, this was also wrongly replaced while refactoring to define src and dst ########## gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetTest.java: ########## @@ -98,6 +101,10 @@ public class IcebergDatasetTest { new MockIcebergTable.SnapshotPaths(Optional.empty(), MANIFEST_LIST_PATH_1, Arrays.asList( new IcebergSnapshotInfo.ManifestFileInfo(MANIFEST_PATH_1, Arrays.asList(MANIFEST_DATA_PATH_1A, MANIFEST_DATA_PATH_1B)))); + private static final MockIcebergTable.SnapshotPaths SNAPSHOT_PATHS_2 = + new MockIcebergTable.SnapshotPaths(Optional.empty(), Strings.EMPTY, Arrays.asList( + new IcebergSnapshotInfo.ManifestFileInfo(Strings.EMPTY, + Arrays.asList(Strings.EMPTY)))); Review Comment: I have got rid of it altogether since now I am creating the destination side table with already existing snapshot i.e `SNAPSHOT_PATHS_1` ########## gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetTest.java: ########## @@ -382,23 +402,36 @@ private static void verifyCopyEntities(Collection<CopyEntity> copyEntities, List List<String> actual = new ArrayList<>(); for (CopyEntity copyEntity : copyEntities) { String json = copyEntity.toString(); - String filepath = CopyEntityDeserializer.getFilePathAsStringFromJson(json); - actual.add(filepath); + if (isCopyableFile(json)) { + String filepath = CopyEntityDeserializer.getFilePathAsStringFromJson(json); + actual.add(filepath); + } } Assert.assertEquals(actual.size(), expected.size(), "Set" + actual.toString() + " vs Set" + expected.toString()); Assert.assertEqualsNoOrder(actual.toArray(), expected.toArray()); } + private static boolean isCopyableFile(String json) { Review Comment: That's true and yes I hadn't added the verification step for commit. Thanks for pointing that out. I have added a verification in my latest commit! ########## gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetTest.java: ########## @@ -253,15 +263,19 @@ public void testGenerateCopyEntitiesMultiSnapshotWhenDestEmpty() throws IOExcept FileSystem sourceFs = sourceBuilder.build(); IcebergTable icebergTable = MockIcebergTable.withSnapshots(Arrays.asList(SNAPSHOT_PATHS_1, SNAPSHOT_PATHS_0)); + IcebergTable targetTable = MockIcebergTable.withSnapshots(Arrays.asList(SNAPSHOT_PATHS_2)); IcebergDataset icebergDataset = - new TrickIcebergDataset(testDbName, testTblName, icebergTable, new Properties(), sourceFs); + new TrickIcebergDataset(testDbName, testTblName, icebergTable, targetTable, new Properties(), sourceFs); MockFileSystemBuilder destBuilder = new MockFileSystemBuilder(DEST_FS_URI); FileSystem destFs = destBuilder.build(); CopyConfiguration copyConfiguration = CopyConfiguration.builder(destFs, copyConfigProperties).preserve(PreserveAttributes.fromMnemonicString("")) .copyContext(new CopyContext()).build(); + try (MockedConstruction<PostPublishStep> mockedPostPublishStep = mockConstruction(PostPublishStep.class)) { + PostPublishStep step = new PostPublishStep(icebergDataset.getFileSetId(), Maps.newHashMap(), new IcebergRegisterStep(icebergTable, targetTable), 0); Review Comment: I thought the constructor needed to be mocked since I was running into issues while testing for PostPublishStep not being able to instantiate. Debugging further helped me understand that this step can be avoided and all we need is to use `mockito-inline` instead of `mockito-core` and then it resolved the issue for me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
