[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #432: Allow writers to control size of files generated

GitBox Wed, 04 Sep 2019 16:28:12 -0700

rdblue commented on a change in pull request #432: Allow writers to control 
size of files generated
URL: https://github.com/apache/incubator-iceberg/pull/432#discussion_r321016003


 ##########
 File path: spark/src/main/java/org/apache/iceberg/spark/source/Writer.java
 ##########
 @@ -303,66 +305,128 @@ public String toString() {
         }
       }
     }
+
+    private class EncryptedOutputFileFactory implements 
OutputFileFactory<EncryptedOutputFile> {
+      private final int partitionId;
+      private final long taskId;
+      private final long epochId;
+
+      EncryptedOutputFileFactory(int partitionId, long taskId, long epochId) {
+        this.partitionId = partitionId;
+        this.taskId = taskId;
+        this.epochId = epochId;
+      }
+
+      private String generateFilename() {
+        return format.addExtension(String.format("%05d-%d-%s", partitionId, 
taskId, UUID.randomUUID().toString()));
 
 Review comment:
   The UUID should be the write's UUID so that all of the files for a write can 
be located by path. The purpose of this is to be able to know from two paths 
that they were written by the same operation. That's useful, for example, if a 
Spark job dies and leaves files in the file system, you can identify them all 
with a recursive listing and grep.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #432: Allow writers to control size of files generated

Reply via email to