liukun4515 commented on PR #5605: URL: https://github.com/apache/arrow-rs/pull/5605#issuecomment-2060927598
> Maybe we can add some way to the parquet arrow reader to override its choice of data type for certain columns to allow users to specify types for cases where it is not clear from the parquet file itself. > > @mapleFU and I have been discussing the need for something similar for deciding what Array type to use when reading strings from Parquet files on #5530 -- see [#5530 (comment)](https://github.com/apache/arrow-rs/issues/5530#issuecomment-2052223254)) > > If we had such an API then people using spark created parquet files could specify that the timestamp column should always be UTC (as suggested by @tustvold ) without having to add an explicit cast afterwards I think it's a solution adding the option in the parquet reader, it can help us to resolve some issues. But many issues can't be resolve smoothly. As describe in the comment: https://github.com/apache/arrow-datafusion/issues/9981#issuecomment-2058149468 The same timestamp column maybe used by different timezone user, they may want to do compute on the same timestamp with different timezone. How to do that? @tustvold Do we need to add or wrap the `cast` expr explicitly to the target timestamp column? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org