zeroshade commented on PR #455: URL: https://github.com/apache/arrow-go/pull/455#issuecomment-3137832489
> Both value and typed_value are optional per spec and value can be missing as I understand. While the spec states that `typed_value` may be omitted, it does not say the same about `value`. If the intent is that either can be omitted, the spec should be updated with that wording. > `The value column of a partially shredded object must never contain fields represented by the Parquet columns in typed_value (shredded fields). Readers may always assume that data is written correctly and that shredded fields in typed_value are not present in value.` This test case is to prove that the reader will only read from `typed_value` and ignore the one from `value`. That means, the reader is not responsible to validate the duplicate key and the reader will read from `typed_value`. The section you quoted states that the partially shredded object *must never* contain the fields and that a reader *may assume* that shredded fields aren't present in the `value` field. It also states that the reason why they must never be written that way is because it can result in inconsistent reader behavior. If the intent is for a reader to *always* read from *only* the `typed_value` field in the case of a conflict like this, then the language in the spec should be updated accordingly instead of the current "may" language. > We will generate the schema first which will have both `value `and `typed_value` optional. But a `value` is to be shredded, the `value` column may be required. Do we fail in GO that `value` schema is optional? Correct, the spec states that if the `typed_value` field is omitted, then the `value` field *must* be required, so Go errors if it is optional when the `typed_value` field is omitted causing this test case to fail. > This is same as test case 43. My understanding is that if writer writes wrong data, the reader may only read the `typed_value`. The spec says that's a *valid* thing to do, but it also says that this *must never happen* and doesn't definitively state what the behavior in this case should be. Only that it may be inconsistent. As I said above, if the intent is that the data in the `typed_value` field is given precedence, the spec should be updated to say that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org