[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2256: Spark: Refactor BaseSparkAction

GitBox Fri, 19 Feb 2021 14:35:33 -0800


aokolnychyi commented on a change in pull request #2256:
URL: https://github.com/apache/iceberg/pull/2256#discussion_r579516592




##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -101,47 +117,43 @@
     return allManifests.flatMap(new ReadManifest(ioBroadcast), 
Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildManifestFileDF(SparkSession spark, String 
tableName) {
-    return loadMetadataTable(spark, tableName, table().location(), 
ALL_MANIFESTS).selectExpr("path as file_path");
+  protected Dataset<Row> buildManifestFileDF(Table table) {
+    return loadMetadataTable(table, ALL_MANIFESTS).selectExpr("path as 
file_path");
   }
 
-  protected Dataset<Row> buildManifestListDF(SparkSession spark, Table table) {
+  protected Dataset<Row> buildManifestListDF(Table table) {
     List<String> manifestLists = getManifestListPaths(table.snapshots());
     return spark.createDataset(manifestLists, 
Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildManifestListDF(SparkSession spark, String 
metadataFileLocation) {
-    StaticTableOperations ops = new 
StaticTableOperations(metadataFileLocation, table().io());
-    return buildManifestListDF(spark, new BaseTable(ops, table().name()));
-  }
-
-  protected Dataset<Row> buildOtherMetadataFileDF(SparkSession spark, 
TableOperations ops) {
+  protected Dataset<Row> buildOtherMetadataFileDF(TableOperations ops) {
     List<String> otherMetadataFiles = getOtherMetadataFilePaths(ops);
     return spark.createDataset(otherMetadataFiles, 
Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildValidMetadataFileDF(SparkSession spark, Table 
table, TableOperations ops) {
-    Dataset<Row> manifestDF = buildManifestFileDF(spark, table.name());
-    Dataset<Row> manifestListDF = buildManifestListDF(spark, table);
-    Dataset<Row> otherMetadataFileDF = buildOtherMetadataFileDF(spark, ops);
+  protected Dataset<Row> buildValidMetadataFileDF(Table table, TableOperations 
ops) {
+    Dataset<Row> manifestDF = buildManifestFileDF(table);
+    Dataset<Row> manifestListDF = buildManifestListDF(table);
+    Dataset<Row> otherMetadataFileDF = buildOtherMetadataFileDF(ops);
 
     return manifestDF.union(otherMetadataFileDF).union(manifestListDF);
   }
 
   // Attempt to use Spark3 Catalog resolution if available on the path
   private static final DynMethods.UnboundMethod LOAD_CATALOG = 
DynMethods.builder("loadCatalogMetadataTable")
-      .hiddenImpl("org.apache.iceberg.spark.Spark3Util",
-          SparkSession.class, String.class, MetadataTableType.class)
+      .hiddenImpl("org.apache.iceberg.spark.Spark3Util", SparkSession.class, 
String.class, MetadataTableType.class)
       .orNoop()
       .build();
 
-  private static Dataset<Row> loadCatalogMetadataTable(SparkSession spark, 
String tableName, MetadataTableType type) {
+  private Dataset<Row> loadCatalogMetadataTable(String tableName, 
MetadataTableType type) {
     Preconditions.checkArgument(!LOAD_CATALOG.isNoop(), "Cannot find 
Spark3Util class but Spark3 is in use");
     return LOAD_CATALOG.asStatic().invoke(spark, tableName, type);
   }
 
-  protected static Dataset<Row> loadMetadataTable(SparkSession spark, String 
tableName, String tableLocation,

Review comment:
       Previously, there was an inconsistency in what values we passed to this 
method.
   
   For example, we could pass `tableName` as the metadata location but table 
location as the root table location. I moved it to use the `Table` object 
directly to avoid any surprises.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2256: Spark: Refactor BaseSparkAction

Reply via email to