Weijun-H commented on issue #8492:
URL:
https://github.com/apache/arrow-datafusion/issues/8492#issuecomment-1855859104
> test with simple regexp_match query with `index-0.parquet`
`array-datafusion` took 47.847 seconds.
>
> ```
> SELECT COUNT(*) FROM '*.parquet' WHERE
> ARRAY_LENGTH(
> REGEXP_MATCH(path,
'\\.(asm|c|cc|cpp|cxx|h|hpp|rs|[Ff][0-9]{0,2}(?:or)?|go)$')
> ) > 0;
>
>
> DataFusion CLI v33.0.0
> +----------+
> | COUNT(*) |
> +----------+
> | 5834398 |
> +----------+
> 1 row in set. Query took 47.847 seconds.
>
> ../datafusion-cli/target/release/datafusion-cli -f d0.sql 225.03s user
5.77s system 477% cpu 48.353 total
> ```
>
> `duckdb` took 3.029 seconds
>
> ```
> SELECT COUNT(*) FROM '*.parquet' WHERE
> ARRAY_LENGTH(
> REGEXP_EXTRACT_ALL(path,
'\.(asm|c|cc|cpp|cxx|h|hpp|rs|[Ff][0-9]{0,2}(?:or)?|go)$')
> ) > 0;
>
>
> ┌──────────────┐
> │ count_star() │
> │ int64 │
> ├──────────────┤
> │ 5834398 │
> └──────────────┘
> ../../duckdb/build/release/duckdb -s "`cat d1.sql`" 15.70s user 1.78s
system 576% cpu 3.029 total
> ```
Maybe related to this #8524
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]