doupache commented on PR #12497:
URL: https://github.com/apache/datafusion/pull/12497#issuecomment-2362691758
Thanks @austin362667 and @alamb.
I have updated the PR and learned some Cargo tips from @austin362667.
Using debug build during development is much faster.
```sh
#1
cd benchmarks && cargo build
#2
cargo run --bin imdb -- convert --input ./data/imdb/ --output ./data/imdb/
--format parquet
```
i also test all 21 parquet like follwoing.
```sql
# create table
CREATE EXTERNAL TABLE name (
id INTEGER NOT NULL PRIMARY KEY,
name STRING NOT NULL,
imdb_index STRING,
imdb_id INTEGER,
gender STRING,
name_pcode_cf STRING,
name_pcode_nf STRING,
surname_pcode STRING,
md5sum STRING
)
STORED AS PARQUET
LOCATION '../benchmarks/data/imdb/temp/name.parquet';
# read
SELECT * FROM name LIMIT 5;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]