rdblue commented on a change in pull request #432: Allow writers to control 
size of files generated
URL: https://github.com/apache/incubator-iceberg/pull/432#discussion_r326401551
 
 

 ##########
 File path: spark/src/main/java/org/apache/iceberg/spark/source/Writer.java
 ##########
 @@ -303,96 +305,179 @@ public String toString() {
         }
       }
     }
+
+    private class EncryptedOutputFileFactory implements 
OutputFileFactory<EncryptedOutputFile> {
+      private final int partitionId;
+      private final long taskId;
+      private final long epochId;
+      // The purpose of this uuid is to be able to know from two paths that 
they were written by the same operation.
+      // That's useful, for example, if a Spark job dies and leaves files in 
the file system, you can identify them all
+      // with a recursive listing and grep.
+      private final String uuid = UUID.randomUUID().toString();
+      private int fileCount;
+
+      EncryptedOutputFileFactory(int partitionId, long taskId, long epochId) {
+        this.partitionId = partitionId;
+        this.taskId = taskId;
+        this.epochId = epochId;
+        this.fileCount = 0;
+      }
+
+      private synchronized String generateFilename() {
 
 Review comment:
   I don't think this should be synchronized unless `fileCount` is volatile. 
This doesn't need to be synchronized anyway because each write is 
single-threaded. I would just remove this to make it a bit simpler.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to