alamb commented on PR #7479: URL: https://github.com/apache/arrow-rs/pull/7479#issuecomment-2859515712
The reason for me writing this PR is that I don't think it is clear how parquet / arrow schema conversions are handled, including the embedded arrow schema hint and then the APIs that let people supply / modify their own hint > I think the major confusion, which this PR didn't create, but which it also doesn't really address is that the arrow schema provided may not be what the reader actually uses. If say the arrow schema says TimestampNanoseconds, but the parquet is actually TimestampMilliseconds, IIRC it will return TimestampMilliseconds. My experience is that if the hint schema is provided but doesn't match what is read from the file, an error is raised: https://github.com/apache/arrow-rs/blob/812160005efe3afc63531b8ea051e1fa44a91f67/parquet/src/arrow/arrow_reader/mod.rs#L541-L540 > called `Result::unwrap()` on an `Err` value: ArrowError("incompatible arrow schema, the following fields could not be cast: [column1]") The error is actually pretty bad. I'll make a new PR to imprve that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org