[GitHub] [iceberg] rdblue commented on a change in pull request #3292: Spark: Compact Medium Size Files (#460)

GitBox Wed, 17 Nov 2021 10:32:29 -0800


rdblue commented on a change in pull request #3292:
URL: https://github.com/apache/iceberg/pull/3292#discussion_r751529871




##########
File path: api/src/main/java/org/apache/iceberg/FileFormat.java
##########
@@ -25,23 +25,29 @@
  * Enum of supported file formats.
  */
 public enum FileFormat {
-  ORC("orc", true),
-  PARQUET("parquet", true),
-  AVRO("avro", true),
-  METADATA("metadata.json", false);
+  ORC("orc", true, true),
+  PARQUET("parquet", true, true),
+  AVRO("avro", true, false),
+  METADATA("metadata.json", false, false);
 
   private final String ext;
   private final boolean splittable;
+  private final boolean offsets;
 
-  FileFormat(String ext, boolean splittable) {
+  FileFormat(String ext, boolean splittable, boolean offsets) {
     this.ext = "." + ext;
     this.splittable = splittable;
+    this.offsets = offsets;
   }
 
   public boolean isSplittable() {
     return splittable;
   }
 
+  public boolean hasOffsets() {
+    return offsets;

Review comment:
       Don't we want to treat every file the same? If it has split offsets, we 
use them and if not we try to split at the target size? That is dependent on 
the file metadata and not the format.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #3292: Spark: Compact Medium Size Files (#460)

Reply via email to