kbendick commented on a change in pull request #2916: URL: https://github.com/apache/iceberg/pull/2916#discussion_r685765465
########## File path: core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java ########## @@ -105,7 +112,11 @@ protected BaseEqualityDeltaWriter(StructLike partition, Schema schema, Schema de this.dataWriter = new RollingFileWriter(partition); this.eqDeleteWriter = new RollingEqDeleteWriter(partition); - this.posDeleteWriter = new SortedPosDeleteWriter<>(appenderFactory, fileFactory, format, partition); + this.posDeleteWriter = new SortedPosDeleteWriter<>(appenderFactory, + fileFactory, + format, + partition, + getRecordsNumThreshold(properties)); Review comment: Nit: This function name is admittedly confusing for me. Also, we're passing around a potentially large table properties map across a large number of frameworks where previously we only passed around Iceberg specific classes. I'm not so worried about the size (though it does seem potentially wasteful to be passing the whole table properties map around when only one field is presently needed). I'm more worried about serializability concerns. Sometimes certain maps made by Guava etc (ImmutableMap comes to mind) are not serializable, particularly when using Kryo (which is not the default but the de facto default with Spark). Do we have any tests to ensure these new changes can be serialized? Or at the least, if you're going to pass around the whole table properties map like this, can you check that serialization doesn't break when using Kryo - I think there are some unit tests for checking kryo and java serde with Spark. Essentially, just like with Flink, the driver needs to be able to serialize all of this code and send it to the executors (similar to how the job manager sends generated code to the task managers responsible for individual subtasks). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org