rdblue commented on a change in pull request #432: Allow writers to control
size of files generated
URL: https://github.com/apache/incubator-iceberg/pull/432#discussion_r321016003
##########
File path: spark/src/main/java/org/apache/iceberg/spark/source/Writer.java
##########
@@ -303,66 +305,128 @@ public String toString() {
}
}
}
+
+ private class EncryptedOutputFileFactory implements
OutputFileFactory<EncryptedOutputFile> {
+ private final int partitionId;
+ private final long taskId;
+ private final long epochId;
+
+ EncryptedOutputFileFactory(int partitionId, long taskId, long epochId) {
+ this.partitionId = partitionId;
+ this.taskId = taskId;
+ this.epochId = epochId;
+ }
+
+ private String generateFilename() {
+ return format.addExtension(String.format("%05d-%d-%s", partitionId,
taskId, UUID.randomUUID().toString()));
Review comment:
The UUID should be the write's UUID so that all of the files for a write can
be located by path. The purpose of this is to be able to know from two paths
that they were written by the same operation. That's useful, for example, if a
Spark job dies and leaves files in the file system, you can identify them all
with a recursive listing and grep.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]