Jimexist commented on issue #1441:
URL:
https://github.com/apache/arrow-datafusion/issues/1441#issuecomment-992609965
thinking out loud: doing `pandas` validation:
```ipython
In [10]: import pandas as pd
In [11]: csv_pd = pd.read_csv('./csvs/stop.csv')
In [12]: pq_pd = pd.read_parquet('./parquets/stops')
In [13]: csv_pd.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33254 entries, 0 to 33253
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 33254 non-null object
1 trip_tid 32113 non-null float64
2 trip_line 32126 non-null object
3 stop_name 705 non-null object
dtypes: float64(1), object(3)
memory usage: 1.0+ MB
In [14]: pq_pd.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33254 entries, 0 to 33253
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 33254 non-null datetime64[ns]
1 trip_tid 32113 non-null float64
2 trip_line 32126 non-null object
3 stop_name 705 non-null object
dtypes: datetime64[ns](1), float64(1), object(2)
memory usage: 1.0+ MB
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]