zeroshade commented on PR #455:
URL: https://github.com/apache/arrow-go/pull/455#issuecomment-3137832489

   > Both value and typed_value are optional per spec and value can be missing 
as I understand.
   
   While the spec states that `typed_value` may be omitted, it does not say the 
same about `value`. If the intent is that either can be omitted, the spec 
should be updated with that wording. 
   
   > `The value column of a partially shredded object must never contain fields 
represented by the Parquet columns in typed_value (shredded fields). Readers 
may always assume that data is written correctly and that shredded fields in 
typed_value are not present in value.` This test case is to prove that the 
reader will only read from `typed_value` and ignore the one from `value`. That 
means, the reader is not responsible to validate the duplicate key and the 
reader will read from `typed_value`.
   
   The section you quoted states that the partially shredded object *must 
never* contain the fields and that a reader *may assume* that shredded fields 
aren't present in the `value` field. It also states that the reason why they 
must never be written that way is because it can result in inconsistent reader 
behavior. If the intent is for a reader to *always* read from *only* the 
`typed_value` field in the case of a conflict like this, then the language in 
the spec should be updated accordingly instead of the current "may" language. 
   
   > We will generate the schema first which will have both `value `and 
`typed_value` optional. But a `value` is to be shredded, the `value` column may 
be required. Do we fail in GO that `value` schema is optional?
   
   Correct, the spec states that if the `typed_value` field is omitted, then 
the `value` field *must* be required, so Go errors if it is optional when the 
`typed_value` field is omitted causing this test case to fail.
   
   > This is same as test case 43. My understanding is that if writer writes 
wrong data, the reader may only read the `typed_value`.
   
   The spec says that's a *valid* thing to do, but it also says that this *must 
never happen* and doesn't definitively state what the behavior in this case 
should be. Only that it may be inconsistent. As I said above, if the intent is 
that the data in the `typed_value` field is given precedence, the spec should 
be updated to say that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to