aihuaxu commented on code in PR #12105:
URL: https://github.com/apache/iceberg/pull/12105#discussion_r1931574393
##########
core/src/main/java/org/apache/iceberg/variants/PrimitiveWrapper.java:
##########
@@ -47,17 +48,23 @@ class PrimitiveWrapper<T> implements VariantPrimitive<T> {
private static final byte BINARY_HEADER =
VariantUtil.primitiveHeader(Primitives.TYPE_BINARY);
private static final byte STRING_HEADER =
VariantUtil.primitiveHeader(Primitives.TYPE_STRING);
- private final Variants.PhysicalType type;
+ private final PhysicalType type;
private final T value;
private ByteBuffer buffer = null;
- PrimitiveWrapper(Variants.PhysicalType type, T value) {
- this.type = type;
+ PrimitiveWrapper(PhysicalType type, T value) {
+ if (value instanceof Boolean
Review Comment:
Yeah. You are right. `BOOLEAN_TRUE` and `BOOLEAN_FALSE` physical types need
special handling. When we shred them, they should be grouped together,
considered as one type.
And also we may need to have a type list {`NULL`, `BOOLEAN`, `INT8`,
`INT16`, etc}, which is almost same as physical type list with BOOLEAN for
`BOOLEAN_TRUE` and `BOOLEAN_FALSE`. Otherwise, we can't represent a shredded
column type for true/false.
I'm fine to keep it simple for now. We can revisit if needed.
##########
core/src/main/java/org/apache/iceberg/variants/ShreddedObject.java:
##########
@@ -35,22 +39,55 @@
* fields. This also does not allow updating or replacing the metadata for the
unshredded object,
* which could require recursively rewriting field IDs.
*/
-class ShreddedObject implements VariantObject {
- private final SerializedMetadata metadata;
- private final SerializedObject unshredded;
+public class ShreddedObject implements VariantObject {
+ private final VariantMetadata metadata;
+ private final VariantObject unshredded;
private final Map<String, VariantValue> shreddedFields = Maps.newHashMap();
+ private final Set<String> removedFields = Sets.newHashSet();
private SerializationState serializationState = null;
- ShreddedObject(SerializedMetadata metadata) {
+ ShreddedObject(VariantMetadata metadata) {
this.metadata = metadata;
this.unshredded = null;
}
- ShreddedObject(SerializedObject unshredded) {
- this.metadata = unshredded.metadata();
+ ShreddedObject(VariantMetadata metadata, VariantObject unshredded) {
+ this.metadata = metadata;
this.unshredded = unshredded;
}
+ @VisibleForTesting
+ VariantMetadata metadata() {
+ return metadata;
+ }
+
+ private Set<String> nameSet() {
+ Set<String> names = Sets.newHashSet(shreddedFields.keySet());
+
+ if (unshredded != null) {
+ Iterables.addAll(names, unshredded.fieldNames());
+ }
+
+ names.removeAll(removedFields);
+
+ return names;
+ }
+
+ @Override
+ public Iterable<String> fieldNames() {
+ return nameSet();
+ }
+
+ @Override
+ public int numFields() {
+ return nameSet().size();
+ }
+
+ public void remove(String field) {
Review Comment:
Thanks for explanation.
If a field is missing and we remove the field from `shreddedFields`, why do
we still need `removedFields` to keep track of it? Would the following get the
correct field list?
```
private Set<String> nameSet() {
Set<String> names = Sets.newHashSet(shreddedFields.keySet());
if (unshredded != null) {
Iterables.addAll(names, unshredded.fieldNames());
}
return names;
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]