RussellSpitzer commented on issue #15086:
URL: https://github.com/apache/iceberg/issues/15086#issuecomment-3774744335

   Looking further I think I found the real issue in the test code (which may 
also be a library bug?)
   
   ```java
               Record rec = GenericRecord.create(SCHEMA.asStruct());
               ShreddedObject obj = Variants.object(metadata);
               obj.put("id", Variants.of(1000L + i));
               obj.put("name", Variants.of("user_" + i));
               obj.put("city", Variants.of("city_" + i));
               
               Variant value = Variant.of(metadata, obj);
   ```
   
   The obj.put methods add name and city as "shredded" fields in the object. 
   So we end up with an object that looks like
   
   `<untyped = null, typed <id, name, city>>`
   
   When the parquet writer "shreds"
   
   
https://github.com/apache/iceberg/blob/bfec39f64666b8d49dc5061a6c5cfa96062f613a/parquet/src/main/java/org/apache/iceberg/parquet/ParquetVariantWriters.java#L324-L359
   
   It will not "unshred" the shredded fields in your object just because they 
are not listed in the shredded fields schema. It instead just extracts any 
shredded values and then copies over the untyped value portion. In your test 
case, this is an empty object.
   
   I'm not sure this is a bug... but maybe we need to document the object better
   
   
   If instead you use the VariantTestUtil to construct the variant it would 
work correctly
   
   ```
       private static List<Record> buildTestRecords() {
           List<Record> records = new ArrayList<>();
           ByteBuffer metadataBuffer = 
VariantTestUtil.createMetadata(List.of("id", "name", "city"), true);
           VariantMetadata metadata = Variants.metadata(metadataBuffer);
           
           for (int i = 0; i < 3; i++) {
               // Create the full object with all fields using VariantTestUtil
               ByteBuffer objectBuffer = VariantTestUtil.createObject(
                   metadataBuffer,
                   ImmutableMap.of(
                       "id", Variants.of(1000L + i),
                       "name", Variants.of("user_" + i),
                       "city", Variants.of("city_" + i)
                   )
               );
               
               // Convert to VariantObject
               VariantObject variantObject = (VariantObject) 
Variants.value(metadata, objectBuffer);
               Variant value = Variant.of(metadata, variantObject);
               
               Record rec = GenericRecord.create(SCHEMA.asStruct());
               rec.setField("data", value);
               records.add(rec);
           }
           
           return records;
       }
   ```
   
   
   Output from my own test
   ```
   Row 0:
     metadata: 11030004060a6369747969646e616d65
     metadata (decoded): ......cityidname
     value (hex): 0202000200070e19636974795f3019757365725f30
     value (ASCII dump): [02][02][00][02][00][07][0e][19]city_0[19]user_0
     value contains 'user_0': true
     value contains 'city_0': true
     typed_value/id/typed_value: 1000
   
   Row 1:
     metadata: 11030004060a6369747969646e616d65
     metadata (decoded): ......cityidname
     value (hex): 0202000200070e19636974795f3119757365725f31
     value (ASCII dump): [02][02][00][02][00][07][0e][19]city_1[19]user_1
     value contains 'user_1': true
     value contains 'city_1': true
     typed_value/id/typed_value: 1001
   
   Row 2:
     metadata: 11030004060a6369747969646e616d65
     metadata (decoded): ......cityidname
     value (hex): 0202000200070e19636974795f3219757365725f32
     value (ASCII dump): [02][02][00][02][00][07][0e][19]city_2[19]user_2
     value contains 'user_2': true
     value contains 'city_2': true
     typed_value/id/typed_value: 1002
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to