Jimexist commented on issue #1441:
URL: 
https://github.com/apache/arrow-datafusion/issues/1441#issuecomment-992576814


   hi @franeklubi thanks for the detailed sharing.
   
   to simplify bug reproduction, can you help me understand the difference 
between parquet and csv data? specifically:
   
   ```
   ❯ CREATE EXTERNAL TABLE stop STORED AS PARQUET LOCATION './parquets/stops';
   0 rows in set. Query took 0.001 seconds.
   ❯ select count(*) from stop;
   +-----------------+
   | COUNT(UInt8(1)) |
   +-----------------+
   | 33254           |
   +-----------------+
   1 row in set. Query took 0.007 seconds.
   ```
   
   ```
   ❯ CREATE EXTERNAL TABLE stop (time TEXT, trip_tid TEXT, trip_line TEXT, 
stop_name TEXT) STORED AS CSV LOCATION './csvs/stop.csv';
   0 rows in set. Query took 0.000 seconds.
   ❯ select count(*) from stop;
   +-----------------+
   | COUNT(UInt8(1)) |
   +-----------------+
   | 33255           |
   +-----------------+
   1 row in set. Query took 0.014 seconds.
   ```
   
   they seem to have different number of rows.
   
   Same thing applies to trips data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to