rdblue commented on PR #455:
URL: https://github.com/apache/arrow-go/pull/455#issuecomment-3152748447

   @zeroshade, sorry for the confusion here. You're right about a lot of those 
test cases. They are not allowed by the spec. The implementation I generated 
these cases from is defensive and tries to read if it can rather than producing 
errors. I'd recommend doing the same thing to handle outside-of-spec cases.
   
   For instance, most of the time if a column is missing, most implementations 
will allow you to project a column of nulls. Extending this idea to Variant, 
it's reasonable to assume that a missing `value` column indicates a column of 
nulls and read accordingly instead of failing. The other cases are similar.
   
   The most confusing one is where there is a field in the `value` of a struct 
that is also a shredded field. The rationale here was that the shredded value 
should always take precedence because the shredded value may be read without 
the rest of the struct (if you're projecting `extract_value(var, "$['b']", 
...)`) and that the behavior should not change based on the Parquet column 
projection.
   
   The behavior in these cases was debated when we were working on the spec. We 
ultimately decided to disallow writers from producing them, but I think it is a 
best practice to ensure that the read behavior is predictable, accepts even 
slightly malformed cases, and has consistent behavior depending on the 
projected columns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to