aihuaxu commented on PR #455:
URL: https://github.com/apache/arrow-go/pull/455#issuecomment-3137794519

   > Several test cases test variations on situations where the value column is 
missing. Based on my reading of the 
[spec](https://github.com/apache/parquet-format/blob/master/VariantShredding.md)
 this seems to be an invalid scenario. The specific case is that the spec 
states the typed_value field may be omitted when not shredding elements as a 
specific type, but says nothing about allowing omission of the value field. 
Currently, the Go implementation will error if this field is missing as per my 
reading of the spec, meaning those test cases fail.
   
   Both `value` and `typed_value` are optional per spec and `value` can be 
missing as I understand.
   <img width="1091" height="251" alt="image" 
src="https://github.com/user-attachments/assets/32432ac1-edea-477b-81c0-967dd9304115";
 />
   
    
   > Test case 43 testPartiallyShreddedObjectMissingFieldConflict seems to have 
a conflict between what is expected and what in the spec. The b field exists 
within the value field, while also being a shredded field, the test appears to 
assume the data in the value field would be ignored, but the 
[spec](https://github.com/apache/parquet-format/blob/master/VariantShredding.md#objects)
 says that value must never contain fields represented by the shredded fields. 
This needs clarification on the desired behavior and result.
   
   `The value column of a partially shredded object must never contain fields 
represented by the Parquet columns in typed_value (shredded fields). Readers 
may always assume that data is written correctly and that shredded fields in 
typed_value are not present in value.`  This test case is to prove that the 
reader will only read from `typed_value` and ignore the one from `value`. That 
means, the reader is not responsible to validate the duplicate key and the 
reader will read from `typed_value`.
   
   
   > Test case 84, testShreddedObjectWithOptionalFieldStructs tests the 
schenario where the shredded fields of an object are listed as optional in the 
schema, but the spec states that they must be required. Thus, the Go 
implementation errors on this test as the spec says this is an error. 
Clarification is needed on if this is a valid test case.
   
   This seems to be reasonable to error out for me to enforce it's required in 
the schema. 
   
   ```
       required group email {
         optional binary value;
         optional binary typed_value (STRING);
       }
   
   ```
   
   > Test case 38 testShreddedObjectMissingTypedValue tests the case where the 
typed_value field is missing, this is allowed by the spec except that the spec 
states that in this scenario the value field must be required. The test case 
uses optional in this scenario causing the Go implementation to fail. 
Clarification is needed here.
   
   We will generate the schema first which will have both `value` and 
`typed_value` optional. But a value is to be shredded, the `value`  column may 
be required. Do we fail in GO that `value` schema is optional?  
   
   
   > Test case 125, testPartiallyShreddedObjectFieldConflict again tests the 
case of a field existing in both the value and the shredded column which the 
spec states is invalid and will lead to inconsistent results. Thus it is not 
valid to have this test case assert a specific result according to the spec 
unless the spec is amended to state that the shredded field takes precedence in 
this case.
   
    This is same as test case 43. My understanding is that if writer writes 
wrong data, the reader may only read the `typed_value`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to