[GitHub] [arrow-datafusion] sundy-li commented on issue #5404: Datafusion v19.rc1 scan parquet 20x slower than DuckDB v0.6.1

via GitHub Sun, 26 Feb 2023 17:58:25 -0800


sundy-li commented on issue #5404:
URL: 
https://github.com/apache/arrow-datafusion/issues/5404#issuecomment-1445557500


   @jychen7  I checked in my 16-core linux with SSD, duckdb read parquet still 
faster.
   
   duckdb v0.6.0
   ```
   D CREATE VIEW hits AS
   > SELECT *
   > REPLACE
   > (epoch_ms(EventTime * 1000) AS EventTime,
   >  DATE '1970-01-01' + INTERVAL (EventDate) DAYS AS EventDate)
   > FROM read_parquet('hits.parquet', binary_as_string=True);
   D
   D .timer on
   D select count(1),  max(URL) from hits;
   ┌──────────┬─────────────────────────────────────────┐
   │ count(1) │               max("URL")                │
   │  int64   │                 varchar                 │
   ├──────────┼─────────────────────────────────────────┤
   │ 99997497 │ https://yugra-advert2792270][to]=&input │
   └──────────┴─────────────────────────────────────────┘
   Run Time (s): real 0.957 user 21.641735 sys 4.245200
   ```
   
   datafusion:
   ```
   ❯ select max("URL") from hits;
   +-----------------------------------------+
   | MAX(hits.URL)                           |
   +-----------------------------------------+
   | https://yugra-advert2792270][to]=&input |
   +-----------------------------------------+
   1 row in set. Query took 2.849 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] sundy-li commented on issue #5404: Datafusion v19.rc1 scan parquet 20x slower than DuckDB v0.6.1

Reply via email to