[GitHub] [arrow-datafusion] alitrack opened a new issue #2044: wrong result when operation parquet

GitBox Sun, 20 Mar 2022 18:37:07 -0700


alitrack opened a new issue #2044:
URL: https://github.com/apache/arrow-datafusion/issues/2044



   **Describe the bug**
   A clear and concise description of what the bug is.
   when use register_parquet, datetime got wrong result, but register_csv no 
problem.
   if use pandas read it dataframe and use register_record_batches also OK.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   ```python
   import datafusion
   import pyarrow as pa
   
   ctx = datafusion.ExecutionContext()
   ctx.register_parquet('taxi_sample','yellow_taxi_sample.parquet')
   sql ="select * from taxi_sample"
   pydf=ctx.sql(query)
   pa.Table.from_batches(pydf.collect()).to_pandas()  
   ```
   
   **Expected behavior**
   A clear and concise description of what you expected to happen.
   expected result ,
   
   ```
        pickup_datetime
   0    2009-01-04 02:52:00
   1    2009-01-04 03:31:00
   2    2009-01-03 15:43:00
   ```
   
   but got,
   ```
   pickup_datetime
   0    1970-01-15 05:57:17.520
   1    1970-01-15 05:57:19.860
   2    1970-01-15 05:56:37.380
   ```
   
   **Additional context**
   Add any other context about the problem here.
   
   the sample data is part of [Year 2009-2015 - 1 billion rows - 
107GB](https://vaex.s3.us-east-2.amazonaws.com/taxi/yellow_taxi_2009_2015_f32.hdf5)
   
   
[yellow_taxi_sample.parquet.zip](https://github.com/apache/arrow-datafusion/files/8312598/yellow_taxi_sample.parquet.zip)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alitrack opened a new issue #2044: wrong result when operation parquet

Reply via email to