[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2210: Adds AddFiles Procedure

GitBox Tue, 09 Mar 2021 19:28:12 -0800


aokolnychyi commented on a change in pull request #2210:
URL: https://github.com/apache/iceberg/pull/2210#discussion_r590982468




##########
File path: spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java
##########
@@ -522,14 +523,32 @@ public static void importSparkTable(
         importUnpartitionedSparkTable(spark, sourceTableIdentWithDB, 
targetTable);
       } else {
         List<SparkPartition> sourceTablePartitions = getPartitions(spark, 
sourceTableIdent);
-        importSparkPartitions(spark, sourceTablePartitions, targetTable, spec, 
stagingDir);
+        List<SparkPartition> filteredPartitions = 
filterPartitions(sourceTablePartitions, partitionFilter);
+        importSparkPartitions(spark, filteredPartitions, targetTable, spec, 
stagingDir);
       }
     } catch (AnalysisException e) {
       throw SparkExceptionUtil.toUncheckedException(
           e, "Unable to get partition spec for table: %s", 
sourceTableIdentWithDB);
     }
   }
 
+  /**
+   * Import files from an existing Spark table to an Iceberg table.
+   *
+   * The import uses the Spark session to get table metadata. It assumes no
+   * operation is going on the original and target table and thus is not
+   * thread-safe.
+   *
+   * @param spark a Spark session
+   * @param sourceTableIdent an identifier of the source Spark table
+   * @param targetTable an Iceberg table where to import the data
+   * @param stagingDir a staging directory to store temporary manifest files
+   */
+  public static void importSparkTable(
+      SparkSession spark, TableIdentifier sourceTableIdent, Table targetTable, 
String stagingDir) {

Review comment:
       nit: I like your formatting for the method above a bit more but I 
understand this file is not consistent.

##########
File path: spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java
##########
@@ -522,14 +523,32 @@ public static void importSparkTable(
         importUnpartitionedSparkTable(spark, sourceTableIdentWithDB, 
targetTable);
       } else {
         List<SparkPartition> sourceTablePartitions = getPartitions(spark, 
sourceTableIdent);
-        importSparkPartitions(spark, sourceTablePartitions, targetTable, spec, 
stagingDir);
+        List<SparkPartition> filteredPartitions = 
filterPartitions(sourceTablePartitions, partitionFilter);
+        importSparkPartitions(spark, filteredPartitions, targetTable, spec, 
stagingDir);
       }
     } catch (AnalysisException e) {
       throw SparkExceptionUtil.toUncheckedException(
           e, "Unable to get partition spec for table: %s", 
sourceTableIdentWithDB);
     }
   }
 
+  /**
+   * Import files from an existing Spark table to an Iceberg table.
+   *
+   * The import uses the Spark session to get table metadata. It assumes no
+   * operation is going on the original and target table and thus is not
+   * thread-safe.
+   *
+   * @param spark a Spark session
+   * @param sourceTableIdent an identifier of the source Spark table
+   * @param targetTable an Iceberg table where to import the data
+   * @param stagingDir a staging directory to store temporary manifest files
+   */
+  public static void importSparkTable(
+      SparkSession spark, TableIdentifier sourceTableIdent, Table targetTable, 
String stagingDir) {

Review comment:
       nit: I like your arg formatting for the method above a bit more but I 
understand this file is not consistent.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2210: Adds AddFiles Procedure

Reply via email to