matthewmturner commented on issue #2109:
URL: 
https://github.com/apache/arrow-datafusion/issues/2109#issuecomment-1082557831


   Indeed, I was able to reproduce the performance regression building from 
source:
   
   Master (maybe a few commits behind, i havent pulled latest in a few days)
   ```
   DataFusion CLI v7.0.0
   ❯ CREATE EXTERNAL TABLE taxi STORED AS CSV WITH HEADER ROW LOCATION 
'./taxi.csv';
   0 rows in set. Query took 84.245 seconds.
   ```
   
   7.0.0
   ```
   DataFusion CLI v7.0.0
   ❯ CREATE EXTERNAL TABLE taxi STORED AS CSV WITH HEADER ROW LOCATION 
'./taxi.csv';
   0 rows in set. Query took 112.486 seconds.
   ```
   
   6.0.0 (I think the version shown when launching is wrong)
   ```
   DataFusion CLI v5.1.0-SNAPSHOT
   
   ❯ CREATE EXTERNAL TABLE taxi STORED AS CSV WITH HEADER ROW LOCATION 
'./taxi.csv';
   0 rows in set. Query took 2.645 seconds.
   ```
   
   My guess is that this could be coming from an arrow-rs related change (which 
handles IO) - but i havent been tracking all the changes in detail there 
lately.  I likely wont have time to dig into this more for a few days.
   
   @alamb does anything come to mind?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to