[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3292: Spark: Compact Medium Size Files (#460)

GitBox Wed, 17 Nov 2021 10:36:46 -0800


RussellSpitzer commented on a change in pull request #3292:
URL: https://github.com/apache/iceberg/pull/3292#discussion_r751532891




##########
File path: api/src/main/java/org/apache/iceberg/FileFormat.java
##########
@@ -25,23 +25,29 @@
  * Enum of supported file formats.
  */
 public enum FileFormat {
-  ORC("orc", true),
-  PARQUET("parquet", true),
-  AVRO("avro", true),
-  METADATA("metadata.json", false);
+  ORC("orc", true, true),
+  PARQUET("parquet", true, true),
+  AVRO("avro", true, false),
+  METADATA("metadata.json", false, false);
 
   private final String ext;
   private final boolean splittable;
+  private final boolean offsets;
 
-  FileFormat(String ext, boolean splittable) {
+  FileFormat(String ext, boolean splittable, boolean offsets) {
     this.ext = "." + ext;
     this.splittable = splittable;
+    this.offsets = offsets;
   }
 
   public boolean isSplittable() {
     return splittable;
   }
 
+  public boolean hasOffsets() {
+    return offsets;

Review comment:
       This seemed much more confusing to me because of the split behavior on 
files without split offset information. I'd rather it's just explicit, we have 
files we know how to split but can only be split on offsets, and we have files 
that we know how to split can be split arbitrarily. I was trying to figure out 
a another term for this, basically I want to differentiate between formats like 
ORC , Parquet and Avro




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3292: Spark: Compact Medium Size Files (#460)

Reply via email to