Jimexist edited a comment on issue #1441:
URL: 
https://github.com/apache/arrow-datafusion/issues/1441#issuecomment-992609965


   thinking out loud: doing `pandas` validation:
   
   ```jupyter
   In [10]: import pandas as pd
   
   In [11]: csv_pd = pd.read_csv('./csvs/stop.csv')
   
   In [12]: pq_pd = pd.read_parquet('./parquets/stops')
   
   In [13]: csv_pd.info()
   <class 'pandas.core.frame.DataFrame'>
   RangeIndex: 33254 entries, 0 to 33253
   Data columns (total 4 columns):
    #   Column     Non-Null Count  Dtype
   ---  ------     --------------  -----
    0   time       33254 non-null  object
    1   trip_tid   32113 non-null  float64
    2   trip_line  32126 non-null  object
    3   stop_name  705 non-null    object
   dtypes: float64(1), object(3)
   memory usage: 1.0+ MB
   
   In [14]: pq_pd.info()
   <class 'pandas.core.frame.DataFrame'>
   RangeIndex: 33254 entries, 0 to 33253
   Data columns (total 4 columns):
    #   Column     Non-Null Count  Dtype
   ---  ------     --------------  -----
    0   time       33254 non-null  datetime64[ns]
    1   trip_tid   32113 non-null  float64
    2   trip_line  32126 non-null  object
    3   stop_name  705 non-null    object
   dtypes: datetime64[ns](1), float64(1), object(2)
   memory usage: 1.0+ MB
   
   ``` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to