RussellSpitzer commented on issue #15086:
URL: https://github.com/apache/iceberg/issues/15086#issuecomment-3774744335
Looking further I think I found the real issue in the test code (which may
also be a library bug?)
```java
Record rec = GenericRecord.create(SCHEMA.asStruct());
ShreddedObject obj = Variants.object(metadata);
obj.put("id", Variants.of(1000L + i));
obj.put("name", Variants.of("user_" + i));
obj.put("city", Variants.of("city_" + i));
Variant value = Variant.of(metadata, obj);
```
The obj.put methods add name and city as "shredded" fields in the object.
So we end up with an object that looks like
`<untyped = null, typed <id, name, city>>`
When the parquet writer "shreds"
https://github.com/apache/iceberg/blob/bfec39f64666b8d49dc5061a6c5cfa96062f613a/parquet/src/main/java/org/apache/iceberg/parquet/ParquetVariantWriters.java#L324-L359
It will not "unshred" the shredded fields in your object just because they
are not listed in the shredded fields schema. It instead just extracts any
shredded values and then copies over the untyped value portion. In your test
case, this is an empty object.
I'm not sure this is a bug... but maybe we need to document the object better
If instead you use the VariantTestUtil to construct the variant it would
work correctly
```
private static List<Record> buildTestRecords() {
List<Record> records = new ArrayList<>();
ByteBuffer metadataBuffer =
VariantTestUtil.createMetadata(List.of("id", "name", "city"), true);
VariantMetadata metadata = Variants.metadata(metadataBuffer);
for (int i = 0; i < 3; i++) {
// Create the full object with all fields using VariantTestUtil
ByteBuffer objectBuffer = VariantTestUtil.createObject(
metadataBuffer,
ImmutableMap.of(
"id", Variants.of(1000L + i),
"name", Variants.of("user_" + i),
"city", Variants.of("city_" + i)
)
);
// Convert to VariantObject
VariantObject variantObject = (VariantObject)
Variants.value(metadata, objectBuffer);
Variant value = Variant.of(metadata, variantObject);
Record rec = GenericRecord.create(SCHEMA.asStruct());
rec.setField("data", value);
records.add(rec);
}
return records;
}
```
Output from my own test
```
Row 0:
metadata: 11030004060a6369747969646e616d65
metadata (decoded): ......cityidname
value (hex): 0202000200070e19636974795f3019757365725f30
value (ASCII dump): [02][02][00][02][00][07][0e][19]city_0[19]user_0
value contains 'user_0': true
value contains 'city_0': true
typed_value/id/typed_value: 1000
Row 1:
metadata: 11030004060a6369747969646e616d65
metadata (decoded): ......cityidname
value (hex): 0202000200070e19636974795f3119757365725f31
value (ASCII dump): [02][02][00][02][00][07][0e][19]city_1[19]user_1
value contains 'user_1': true
value contains 'city_1': true
typed_value/id/typed_value: 1001
Row 2:
metadata: 11030004060a6369747969646e616d65
metadata (decoded): ......cityidname
value (hex): 0202000200070e19636974795f3219757365725f32
value (ASCII dump): [02][02][00][02][00][07][0e][19]city_2[19]user_2
value contains 'user_2': true
value contains 'city_2': true
typed_value/id/typed_value: 1002
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]