alamb commented on PR #7479:
URL: https://github.com/apache/arrow-rs/pull/7479#issuecomment-2859515712

   The reason for me writing this PR is that I don't think it is clear how 
parquet / arrow schema conversions are handled, including the embedded arrow 
schema hint and then the APIs that let people supply / modify their own hint
   
   
   > I think the major confusion, which this PR didn't create, but which it 
also doesn't really address is that the arrow schema provided may not be what 
the reader actually uses. If say the arrow schema says TimestampNanoseconds, 
but the parquet is actually TimestampMilliseconds, IIRC it will return 
TimestampMilliseconds.
   
   My experience is that if the hint schema is provided but doesn't match what 
is read from the file, an error is raised: 
   
   
https://github.com/apache/arrow-rs/blob/812160005efe3afc63531b8ea051e1fa44a91f67/parquet/src/arrow/arrow_reader/mod.rs#L541-L540
   
   > called `Result::unwrap()` on an `Err` value: ArrowError("incompatible 
arrow schema, the following fields could not be cast: [column1]")
   
   The error is actually pretty bad. I'll make a new PR to imprve that.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to