jeffreyssmith2nd commented on issue #7925:
URL: https://github.com/apache/datafusion/issues/7925#issuecomment-2130106971

   The case we're running into in InfluxDB when enabling timezones is slightly 
different. It is a parquet file with Timestamp without a timezone and then 
querying with either a timezone or in UTC. The odd part is that we have 
manually set the schema to be 'UTC', even though the backing parquet file is 
not. 
   
   >  the issue originates when the schema provided to the ParquetExec and used 
for planning doesn't match that of the underlying parquet file.
   
   I have not been able to reproduce the issue using the datafusion-cli, as 
there is no way (that I can see) to change the schema. It does behave as I 
would expect when the schema matches the files.
   
   This is the script I'm using to generate some parquet files.
   ```python
   import numpy as np
   import pandas as pd
   import pyarrow as pa
   import pyarrow.parquet as pq
   
   df = pd.DataFrame({
       'time': [
           pd.Timestamp(year=2024,month=1,day=1,second=0),
           pd.Timestamp(year=2024,month=1,day=1,second=1),
           pd.Timestamp(year=2024,month=1,day=1,second=2),
           pd.Timestamp(year=2024,month=1,day=1,second=3),
       ],
   })
   
   table = pa.Table.from_pandas(df)
   pq.write_table(table, 'example_no_tz.parquet')
   
   
   df = pd.DataFrame({
       'time': [
           pd.Timestamp(year=2024,month=1,day=1,second=0,tz='UTC'),
           pd.Timestamp(year=2024,month=1,day=1,second=1,tz='UTC'),
           pd.Timestamp(year=2024,month=1,day=1,second=2,tz='UTC'),
           pd.Timestamp(year=2024,month=1,day=1,second=3,tz='UTC'),
       ],
   })
   
   
   table = pa.Table.from_pandas(df)
   pq.write_table(table, 'example_with_tz_utc.parquet')
   ```
   Some sample queries:
   ```
   > SET datafusion.execution.parquet.pushdown_filters=true;
   0 row(s) fetched.
   Elapsed 0.003 seconds.
   
   > select * from './example_no_tz.parquet';
   +---------------------+
   | time                |
   +---------------------+
   | 2024-01-01T00:00:00 |
   | 2024-01-01T00:00:01 |
   | 2024-01-01T00:00:02 |
   | 2024-01-01T00:00:03 |
   +---------------------+
   4 row(s) fetched.
   Elapsed 0.003 seconds.
   
   > select * from './example_no_tz.parquet' where time >= 
'2024-01-01T00:00:02.000';
   +---------------------+
   | time                |
   +---------------------+
   | 2024-01-01T00:00:02 |
   | 2024-01-01T00:00:03 |
   +---------------------+
   2 row(s) fetched.
   Elapsed 0.009 seconds.
   
   > select * from './example_no_tz.parquet' where time >= 
('2024-01-01T00:00:02.000' at time zone 'Europe/Brussels');
   +---------------------+
   | time                |
   +---------------------+
   | 2024-01-01T00:00:00 |
   | 2024-01-01T00:00:01 |
   | 2024-01-01T00:00:02 |
   | 2024-01-01T00:00:03 |
   +---------------------+
   4 row(s) fetched.
   Elapsed 0.006 seconds.
   ```
   
   If you perform any queries against the example with timezone, where the 
predicate is not in the same timezone, you get a type coercion error, which I 
believe makes sense.
   
   ```
   > select * from './example_with_tz_utc.parquet';
   +----------------------+
   | time                 |
   +----------------------+
   | 2024-01-01T00:00:00Z |
   | 2024-01-01T00:00:01Z |
   | 2024-01-01T00:00:02Z |
   | 2024-01-01T00:00:03Z |
   +----------------------+
   4 row(s) fetched.
   Elapsed 0.004 seconds.
   
   > select * from './example_with_tz_utc.parquet' where time >= 
'2024-01-01T00:00:02.000';
   +----------------------+
   | time                 |
   +----------------------+
   | 2024-01-01T00:00:02Z |
   | 2024-01-01T00:00:03Z |
   +----------------------+
   2 row(s) fetched.
   Elapsed 0.007 seconds.
   
   > select * from './example_with_tz_utc.parquet' where time >= 
('2024-01-01T00:00:02.000' at time zone 'Europe/Brussels');
   Error during planning: Cannot infer common argument type for comparison 
operation Timestamp(Nanosecond, Some("UTC")) >= Timestamp(Nanosecond, 
Some("Europe/Brussels"))
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to