[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #4152: enable stream-results option to avoid spark driver oom

GitBox Tue, 22 Feb 2022 07:21:36 -0800


RussellSpitzer commented on a change in pull request #4152:
URL: https://github.com/apache/iceberg/pull/4152#discussion_r812059644




##########
File path: 
spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/actions/TestExpireSnapshotsAction.java
##########
@@ -1027,34 +1029,59 @@ public void testExpireAction() {
   @Test
   public void testUseLocalIterator() {
     table.newFastAppend()
-        .appendFile(FILE_A)
-        .commit();
+            .appendFile(FILE_A)
+            .commit();
 
     table.newOverwrite()
-        .deleteFile(FILE_A)
-        .addFile(FILE_B)
-        .commit();
+            .deleteFile(FILE_A)
+            .addFile(FILE_B)
+            .commit();
 
     table.newFastAppend()
-        .appendFile(FILE_C)
-        .commit();
+            .appendFile(FILE_C)
+            .commit();
 
     long end = rightAfterSnapshot();
 
     int jobsBefore = spark.sparkContext().dagScheduler().nextJobId().get();
 
-    ExpireSnapshots.Result results =
-        
SparkActions.get().expireSnapshots(table).expireOlderThan(end).execute();
+    AtomicReference<Integer> totalJobsRun = new AtomicReference<>();
 
-    Assert.assertEquals("Table does not have 1 snapshot after expiration", 1, 
Iterables.size(table.snapshots()));
+    withSQLConf(ImmutableMap.of("spark.sql.adaptive.enabled", "false"), () -> {
+      ExpireSnapshots.Result results =
+              SparkActions.get().expireSnapshots(table).expireOlderThan(end)
+                      .option("stream-results", "false").execute();
 
-    int jobsAfter = spark.sparkContext().dagScheduler().nextJobId().get();
-    int totalJobsRun = jobsAfter - jobsBefore;
+      int jobsAfter = spark.sparkContext().dagScheduler().nextJobId().get();
+      totalJobsRun.set(jobsAfter - jobsBefore);
 
-    checkExpirationResults(1L, 1L, 2L, results);
+      checkExpirationResults(1L, 1L, 2L, results);
+    });
+
+    table.newOverwrite()
+            .deleteFile(FILE_C)
+            .addFile(FILE_B)
+            .commit();
+
+    long endAgain = rightAfterSnapshot();
+
+    int jobsBeforeAgain = 
spark.sparkContext().dagScheduler().nextJobId().get();

Review comment:
       JobsBeforeStreamResults, etc..

##########
File path: 
spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/actions/TestExpireSnapshotsAction.java
##########
@@ -1027,34 +1029,59 @@ public void testExpireAction() {
   @Test
   public void testUseLocalIterator() {
     table.newFastAppend()
-        .appendFile(FILE_A)
-        .commit();
+            .appendFile(FILE_A)
+            .commit();
 
     table.newOverwrite()
-        .deleteFile(FILE_A)
-        .addFile(FILE_B)
-        .commit();
+            .deleteFile(FILE_A)
+            .addFile(FILE_B)
+            .commit();
 
     table.newFastAppend()
-        .appendFile(FILE_C)
-        .commit();
+            .appendFile(FILE_C)
+            .commit();
 
     long end = rightAfterSnapshot();
 
     int jobsBefore = spark.sparkContext().dagScheduler().nextJobId().get();
 
-    ExpireSnapshots.Result results =
-        
SparkActions.get().expireSnapshots(table).expireOlderThan(end).execute();
+    AtomicReference<Integer> totalJobsRun = new AtomicReference<>();
 
-    Assert.assertEquals("Table does not have 1 snapshot after expiration", 1, 
Iterables.size(table.snapshots()));
+    withSQLConf(ImmutableMap.of("spark.sql.adaptive.enabled", "false"), () -> {
+      ExpireSnapshots.Result results =
+              SparkActions.get().expireSnapshots(table).expireOlderThan(end)
+                      .option("stream-results", "false").execute();
 
-    int jobsAfter = spark.sparkContext().dagScheduler().nextJobId().get();
-    int totalJobsRun = jobsAfter - jobsBefore;
+      int jobsAfter = spark.sparkContext().dagScheduler().nextJobId().get();
+      totalJobsRun.set(jobsAfter - jobsBefore);
 
-    checkExpirationResults(1L, 1L, 2L, results);
+      checkExpirationResults(1L, 1L, 2L, results);
+    });
+
+    table.newOverwrite()
+            .deleteFile(FILE_C)
+            .addFile(FILE_B)
+            .commit();
+
+    long endAgain = rightAfterSnapshot();
+
+    int jobsBeforeAgain = 
spark.sparkContext().dagScheduler().nextJobId().get();
+
+    AtomicReference<Integer> totalJobsRunAgain = new AtomicReference<>();
+
+    withSQLConf(ImmutableMap.of("spark.sql.adaptive.enabled", "false"), () -> {
+      ExpireSnapshots.Result results =
+              
SparkActions.get().expireSnapshots(table).expireOlderThan(endAgain)

Review comment:
       Formatting is incorrect here as well




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #4152: enable stream-results option to avoid spark driver oom

Reply via email to