[GitHub] [iceberg] sririshindra commented on a diff in pull request #7121: Spark: Parameterize backup suffix in migrate procedure

via GitHub Sat, 18 Mar 2023 15:55:54 -0700


sririshindra commented on code in PR #7121:
URL: https://github.com/apache/iceberg/pull/7121#discussion_r1141193297



##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java:
##########
@@ -122,6 +122,38 @@ public void testMigrateWithDropBackup() throws IOException 
{
     Assert.assertFalse(spark.catalog().tableExists(tableName + "_BACKUP_"));
   }
 
+  @Test
+  public void testMigrateWithBackupSuffix() throws IOException {
+    Assume.assumeTrue(catalogName.equals("spark_catalog"));
+    String backupSuffix = "_tmp";
+    String location = temp.newFolder().toString();
+    sql(
+        "CREATE TABLE %s (id bigint NOT NULL, data string) USING parquet 
LOCATION '%s'",
+        tableName, location);
+    sql("INSERT INTO TABLE %s VALUES (1, 'a')", tableName);
+
+    Object result =
+        scalarSql(
+            "CALL %s.system.migrate(table => '%s', backup_suffix => '%s')",
+            catalogName, tableName, backupSuffix);
+
+    Assert.assertEquals("Should have added one file", 1L, result);
+
+    Table createdTable = validationCatalog.loadTable(tableIdent);
+
+    String tableLocation = createdTable.location().replace("file:", "");
+    Assert.assertEquals("Table should have original location", location, 
tableLocation);
+
+    sql("INSERT INTO TABLE %s VALUES (1, 'a')", tableName);
+
+    assertEquals(
+        "Should have expected rows",
+        ImmutableList.of(row(1L, "a"), row(1L, "a")),

Review Comment:
   Can you make this (ImmutableList.of(row(1L, "a"), row(2L, "b"))). Otherwise 
there is no need to do "ORDER BY id" in the next line since both ids are same. 



##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java:
##########
@@ -122,6 +122,38 @@ public void testMigrateWithDropBackup() throws IOException 
{
     Assert.assertFalse(spark.catalog().tableExists(tableName + "_BACKUP_"));
   }
 
+  @Test
+  public void testMigrateWithBackupSuffix() throws IOException {
+    Assume.assumeTrue(catalogName.equals("spark_catalog"));
+    String backupSuffix = "_tmp";
+    String location = temp.newFolder().toString();
+    sql(
+        "CREATE TABLE %s (id bigint NOT NULL, data string) USING parquet 
LOCATION '%s'",
+        tableName, location);
+    sql("INSERT INTO TABLE %s VALUES (1, 'a')", tableName);
+
+    Object result =
+        scalarSql(
+            "CALL %s.system.migrate(table => '%s', backup_suffix => '%s')",
+            catalogName, tableName, backupSuffix);
+
+    Assert.assertEquals("Should have added one file", 1L, result);
+
+    Table createdTable = validationCatalog.loadTable(tableIdent);
+
+    String tableLocation = createdTable.location().replace("file:", "");
+    Assert.assertEquals("Table should have original location", location, 
tableLocation);
+
+    sql("INSERT INTO TABLE %s VALUES (1, 'a')", tableName);

Review Comment:
   nit: Can you make the value (2, 'b')



##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java:
##########
@@ -122,6 +122,38 @@ public void testMigrateWithDropBackup() throws IOException 
{
     Assert.assertFalse(spark.catalog().tableExists(tableName + "_BACKUP_"));
   }
 
+  @Test
+  public void testMigrateWithBackupSuffix() throws IOException {
+    Assume.assumeTrue(catalogName.equals("spark_catalog"));
+    String backupSuffix = "_tmp";
+    String location = temp.newFolder().toString();
+    sql(
+        "CREATE TABLE %s (id bigint NOT NULL, data string) USING parquet 
LOCATION '%s'",
+        tableName, location);
+    sql("INSERT INTO TABLE %s VALUES (1, 'a')", tableName);
+
+    Object result =
+        scalarSql(
+            "CALL %s.system.migrate(table => '%s', backup_suffix => '%s')",

Review Comment:
   Maybe you could add a separate test that tests the following
   "CALL %s.system.migrate(table => '%s', drop_backup => true, backup_suffix => 
'%s')"
   
   Wouldn't hurt to test this combination as the code may change in the future 
and method chaining doesn't work as expected. it might be better to add a unit 
test to ensure this doesn't cause any issues. What do you think?



##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java:
##########
@@ -122,6 +122,38 @@ public void testMigrateWithDropBackup() throws IOException 
{
     Assert.assertFalse(spark.catalog().tableExists(tableName + "_BACKUP_"));
   }
 
+  @Test
+  public void testMigrateWithBackupSuffix() throws IOException {
+    Assume.assumeTrue(catalogName.equals("spark_catalog"));
+    String backupSuffix = "_tmp";
+    String location = temp.newFolder().toString();
+    sql(
+        "CREATE TABLE %s (id bigint NOT NULL, data string) USING parquet 
LOCATION '%s'",
+        tableName, location);
+    sql("INSERT INTO TABLE %s VALUES (1, 'a')", tableName);
+
+    Object result =
+        scalarSql(
+            "CALL %s.system.migrate(table => '%s', backup_suffix => '%s')",
+            catalogName, tableName, backupSuffix);
+
+    Assert.assertEquals("Should have added one file", 1L, result);
+
+    Table createdTable = validationCatalog.loadTable(tableIdent);
+
+    String tableLocation = createdTable.location().replace("file:", "");
+    Assert.assertEquals("Table should have original location", location, 
tableLocation);
+
+    sql("INSERT INTO TABLE %s VALUES (1, 'a')", tableName);
+
+    assertEquals(
+        "Should have expected rows",
+        ImmutableList.of(row(1L, "a"), row(1L, "a")),
+        sql("SELECT * FROM %s ORDER BY id", tableName));
+
+    sql("DROP TABLE %s", tableName + backupSuffix);

Review Comment:
   Before you drop the backup table here, can you also assert that the data in 
the backup table remains intact after the migration i.e can you do something 
like
   
   assertEquals(
   "Should have expected rows",
   ImmutableList.of(row(1L, "a")),
   sql("SELECT * FROM %s ORDER BY id", tableName + backupSuffix));
   



##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/MigrateTableProcedure.java:
##########
@@ -91,9 +92,13 @@ public InternalRow[] call(InternalRow args) {
     }
 
     boolean dropBackup = args.isNullAt(2) ? false : args.getBoolean(2);
+    String backupSuffix = args.isNullAt(3) ? null : args.getString(3);
 
     MigrateTableSparkAction migrateTableSparkAction =
         SparkActions.get().migrateTable(tableName).tableProperties(properties);
+    if (backupSuffix != null) {

Review Comment:
   nit: newline before this.



##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/MigrateTableProcedure.java:
##########
@@ -91,9 +92,13 @@ public InternalRow[] call(InternalRow args) {
     }
 
     boolean dropBackup = args.isNullAt(2) ? false : args.getBoolean(2);
+    String backupSuffix = args.isNullAt(3) ? null : args.getString(3);
 
     MigrateTableSparkAction migrateTableSparkAction =
         SparkActions.get().migrateTable(tableName).tableProperties(properties);
+    if (backupSuffix != null) {
+      migrateTableSparkAction = 
migrateTableSparkAction.withBackupSuffix(backupSuffix);
+    }
 
     MigrateTable.Result result;

Review Comment:
   Can you also change this to the following to match the style above
   
   if (dropBackup) {
   migrateTableSparkAction = migrateTableSparkAction.dropBackup()
   }
   
   MigrateTable.Result result = migrateTableSparkAction.execute();



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] sririshindra commented on a diff in pull request #7121: Spark: Parameterize backup suffix in migrate procedure

Reply via email to