rdblue commented on PR #455: URL: https://github.com/apache/arrow-go/pull/455#issuecomment-3152748447
@zeroshade, sorry for the confusion here. You're right about a lot of those test cases. They are not allowed by the spec. The implementation I generated these cases from is defensive and tries to read if it can rather than producing errors. I'd recommend doing the same thing to handle outside-of-spec cases. For instance, most of the time if a column is missing, most implementations will allow you to project a column of nulls. Extending this idea to Variant, it's reasonable to assume that a missing `value` column indicates a column of nulls and read accordingly instead of failing. The other cases are similar. The most confusing one is where there is a field in the `value` of a struct that is also a shredded field. The rationale here was that the shredded value should always take precedence because the shredded value may be read without the rest of the struct (if you're projecting `extract_value(var, "$['b']", ...)`) and that the behavior should not change based on the Parquet column projection. The behavior in these cases was debated when we were working on the spec. We ultimately decided to disallow writers from producing them, but I think it is a best practice to ensure that the read behavior is predictable, accepts even slightly malformed cases, and has consistent behavior depending on the projected columns. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org