liukun4515 commented on PR #5605:
URL: https://github.com/apache/arrow-rs/pull/5605#issuecomment-2060927598

   > Maybe we can add some way to the parquet arrow reader to override its 
choice of data type for certain columns to allow users to specify types for 
cases where it is not clear from the parquet file itself.
   > 
   > @mapleFU and I have been discussing the need for something similar for 
deciding what Array type to use when reading strings from Parquet files on 
#5530 -- see [#5530 
(comment)](https://github.com/apache/arrow-rs/issues/5530#issuecomment-2052223254))
   > 
   > If we had such an API then people using spark created parquet files could 
specify that the timestamp column should always be UTC (as suggested by 
@tustvold ) without having to add an explicit cast afterwards
   
   I think it's a solution adding the option in the parquet reader, it can help 
us to resolve some issues.
   
   But many issues can't be resolve smoothly.
   
   As describe in the comment: 
https://github.com/apache/arrow-datafusion/issues/9981#issuecomment-2058149468
   The same timestamp column maybe used by different timezone user, they may 
want to do compute on the same timestamp with different timezone.  How to do 
that? @tustvold Do we need to add or wrap the `cast` expr explicitly to the 
target timestamp column?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to