aokolnychyi commented on code in PR #7692:
URL: https://github.com/apache/iceberg/pull/7692#discussion_r1203260409


##########
spark/v3.4/spark/src/jmh/java/org/apache/iceberg/spark/source/WritersBenchmark.java:
##########
@@ -363,6 +389,60 @@ public void 
writeUnpartitionedClusteredPositionDeleteWriter(Blackhole blackhole)
     blackhole.consume(writer);
   }
 
+  @Benchmark
+  @Threads(1)
+  public void writeUnpartitionedFanoutPositionDeleteWriter(Blackhole 
blackhole) throws IOException {
+    FileIO io = table().io();
+
+    OutputFileFactory fileFactory = newFileFactory();
+    SparkFileWriterFactory writerFactory =
+        
SparkFileWriterFactory.builderFor(table()).dataFileFormat(fileFormat()).build();
+
+    FanoutPositionOnlyDeleteWriter<InternalRow> writer =
+        new FanoutPositionOnlyDeleteWriter<>(
+            writerFactory, fileFactory, io, TARGET_FILE_SIZE_IN_BYTES);
+
+    PositionDelete<InternalRow> positionDelete = PositionDelete.create();
+    try (FanoutPositionOnlyDeleteWriter<InternalRow> closeableWriter = writer) 
{
+      for (InternalRow row : positionDeleteRows) {
+        String path = row.getString(0);
+        long pos = row.getLong(1);
+        positionDelete.set(path, pos, null);
+        closeableWriter.write(positionDelete, unpartitionedSpec, null);
+      }
+    }
+
+    blackhole.consume(writer);
+  }
+
+  @Benchmark
+  @Threads(1)
+  public void writeUnpartitionedFanoutPositionDeleteWriterShuffled(Blackhole 
blackhole)

Review Comment:
   We should expect 5-15% overhead for the new buffering writer, which can 
still be beneficial for the job if we skip local ordering for inserts and 
potentially avoid spilling. This benchmark also does not take into account the 
cost to order records, it only tests the write performance. We will use this 
writer only if fanout is enabled. We should also explore Puffin delete files 
that would persist bitmaps directly.
   
   ```
   Benchmark                                                                    
                                  Mode  Cnt           Score            Error   
Units
   ParquetWritersBenchmark.writeUnpartitionedClusteredPositionDeleteWriter      
                                    ss    5           6.004 ±          0.185    
s/op
   ParquetWritersBenchmark.writeUnpartitionedFanoutPositionDeleteWriter         
                                    ss    5           6.503 ±          0.171    
s/op
   ParquetWritersBenchmark.writeUnpartitionedFanoutPositionDeleteWriterShuffled 
                                    ss    5           6.616 ±          0.204    
s/op
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to