Re: [PR] Core: Update variant class visibility [iceberg]

via GitHub Mon, 27 Jan 2025 22:10:35 -0800


aihuaxu commented on code in PR #12105:
URL: https://github.com/apache/iceberg/pull/12105#discussion_r1931574393



##########
core/src/main/java/org/apache/iceberg/variants/PrimitiveWrapper.java:
##########
@@ -47,17 +48,23 @@ class PrimitiveWrapper<T> implements VariantPrimitive<T> {
   private static final byte BINARY_HEADER = 
VariantUtil.primitiveHeader(Primitives.TYPE_BINARY);
   private static final byte STRING_HEADER = 
VariantUtil.primitiveHeader(Primitives.TYPE_STRING);
 
-  private final Variants.PhysicalType type;
+  private final PhysicalType type;
   private final T value;
   private ByteBuffer buffer = null;
 
-  PrimitiveWrapper(Variants.PhysicalType type, T value) {
-    this.type = type;
+  PrimitiveWrapper(PhysicalType type, T value) {
+    if (value instanceof Boolean

Review Comment:
   Yeah. You are right. `BOOLEAN_TRUE` and `BOOLEAN_FALSE` physical types need 
special handling. When we shred them, they should be grouped together, 
considered as one type. 
   
   And also we may need to have a type list {`NULL`, `BOOLEAN`, `INT8`, 
`INT16`, etc}, which is almost same as physical type list with BOOLEAN for 
`BOOLEAN_TRUE` and `BOOLEAN_FALSE`. Otherwise, we can't represent a shredded 
column type for true/false. 
   
   I'm fine to keep it simple for now. We can revisit if needed.



##########
core/src/main/java/org/apache/iceberg/variants/ShreddedObject.java:
##########
@@ -35,22 +39,55 @@
  * fields. This also does not allow updating or replacing the metadata for the 
unshredded object,
  * which could require recursively rewriting field IDs.
  */
-class ShreddedObject implements VariantObject {
-  private final SerializedMetadata metadata;
-  private final SerializedObject unshredded;
+public class ShreddedObject implements VariantObject {
+  private final VariantMetadata metadata;
+  private final VariantObject unshredded;
   private final Map<String, VariantValue> shreddedFields = Maps.newHashMap();
+  private final Set<String> removedFields = Sets.newHashSet();
   private SerializationState serializationState = null;
 
-  ShreddedObject(SerializedMetadata metadata) {
+  ShreddedObject(VariantMetadata metadata) {
     this.metadata = metadata;
     this.unshredded = null;
   }
 
-  ShreddedObject(SerializedObject unshredded) {
-    this.metadata = unshredded.metadata();
+  ShreddedObject(VariantMetadata metadata, VariantObject unshredded) {
+    this.metadata = metadata;
     this.unshredded = unshredded;
   }
 
+  @VisibleForTesting
+  VariantMetadata metadata() {
+    return metadata;
+  }
+
+  private Set<String> nameSet() {
+    Set<String> names = Sets.newHashSet(shreddedFields.keySet());
+
+    if (unshredded != null) {
+      Iterables.addAll(names, unshredded.fieldNames());
+    }
+
+    names.removeAll(removedFields);
+
+    return names;
+  }
+
+  @Override
+  public Iterable<String> fieldNames() {
+    return nameSet();
+  }
+
+  @Override
+  public int numFields() {
+    return nameSet().size();
+  }
+
+  public void remove(String field) {

Review Comment:
   Thanks for explanation.
   
   If a field is missing and we remove the field from `shreddedFields`, why do 
we still need `removedFields` to keep track of it?  Would the following get the 
correct field list? 
   
   ```
     private Set<String> nameSet() {
       Set<String> names = Sets.newHashSet(shreddedFields.keySet());
   
       if (unshredded != null) {
         Iterables.addAll(names, unshredded.fieldNames());
       }
   
       return names;
     }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Core: Update variant class visibility [iceberg]

Reply via email to