kbendick commented on a change in pull request #2916:
URL: https://github.com/apache/iceberg/pull/2916#discussion_r685765465



##########
File path: core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java
##########
@@ -105,7 +112,11 @@ protected BaseEqualityDeltaWriter(StructLike partition, 
Schema schema, Schema de
 
       this.dataWriter = new RollingFileWriter(partition);
       this.eqDeleteWriter = new RollingEqDeleteWriter(partition);
-      this.posDeleteWriter = new SortedPosDeleteWriter<>(appenderFactory, 
fileFactory, format, partition);
+      this.posDeleteWriter = new SortedPosDeleteWriter<>(appenderFactory,
+          fileFactory,
+          format,
+          partition,
+          getRecordsNumThreshold(properties));

Review comment:
       Nit: This function name is admittedly confusing for me.
   
   Also, we're passing around a potentially large table properties map across a 
large number of frameworks where previously we only passed around Iceberg 
specific classes. I'm not so worried about the size (though it does seem 
potentially wasteful to be passing the whole table properties map around when 
only one field is presently needed). I'm more worried about serializability 
concerns. Sometimes certain maps made by Guava etc (ImmutableMap comes to mind) 
are not serializable, particularly when using Kryo (which is not the default 
but the de facto default with Spark).
   
   Do we have any tests to ensure these new changes can be serialized? Or at 
the least, if you're going to pass around the whole table properties map like 
this, can you check that serialization doesn't break when using Kryo - I think 
there are some unit tests for checking kryo and java serde with Spark. 
Essentially, just like with Flink, the driver needs to be able to serialize all 
of this code and send it to the executors (similar to how the job manager sends 
generated code to the task managers responsible for individual subtasks).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to