[GitHub] [arrow-datafusion] Jimexist commented on issue #1441: Incorrect results in datafusion

GitBox Mon, 13 Dec 2021 07:47:23 -0800


Jimexist commented on issue #1441:
URL: 
https://github.com/apache/arrow-datafusion/issues/1441#issuecomment-992609965



   thinking out loud: doing `pandas` validation:
   
   ```ipython
   In [10]: import pandas as pd
   
   In [11]: csv_pd = pd.read_csv('./csvs/stop.csv')
   
   In [12]: pq_pd = pd.read_parquet('./parquets/stops')
   
   In [13]: csv_pd.info()
   <class 'pandas.core.frame.DataFrame'>
   RangeIndex: 33254 entries, 0 to 33253
   Data columns (total 4 columns):
    #   Column     Non-Null Count  Dtype
   ---  ------     --------------  -----
    0   time       33254 non-null  object
    1   trip_tid   32113 non-null  float64
    2   trip_line  32126 non-null  object
    3   stop_name  705 non-null    object
   dtypes: float64(1), object(3)
   memory usage: 1.0+ MB
   
   In [14]: pq_pd.info()
   <class 'pandas.core.frame.DataFrame'>
   RangeIndex: 33254 entries, 0 to 33253
   Data columns (total 4 columns):
    #   Column     Non-Null Count  Dtype
   ---  ------     --------------  -----
    0   time       33254 non-null  datetime64[ns]
    1   trip_tid   32113 non-null  float64
    2   trip_line  32126 non-null  object
    3   stop_name  705 non-null    object
   dtypes: datetime64[ns](1), float64(1), object(2)
   memory usage: 1.0+ MB
   
   ``` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Jimexist commented on issue #1441: Incorrect results in datafusion

Reply via email to