andygrove commented on pull request #8409:
URL: https://github.com/apache/arrow/pull/8409#issuecomment-706429236
The results are pretty interesting for me.
Without `--mem-table`:
```
Running benchmarks with the following options: TpchOpt { query: 1, debug:
false, iterations: 3, concurrency: 24, batch_size: 4096, path:
"/mnt/tpch/s1/parquet", file_format: "parquet", mem_table: false }
Query 1 iteration 0 took 241 ms
Query 1 iteration 1 took 164 ms
Query 1 iteration 2 took 167 ms
```
With `--mem-table`:
```
Running benchmarks with the following options: TpchOpt { query: 1, debug:
false, iterations: 3, concurrency: 24, batch_size: 4096, path:
"/mnt/tpch/s1/parquet", file_format: "parquet", mem_table: true }
Loading data into memory
Loaded data into memory in 11240 ms
Query 1 iteration 0 took 353 ms
Query 1 iteration 1 took 302 ms
Query 1 iteration 2 took 322 ms
```
I filed https://issues.apache.org/jira/browse/ARROW-10251 to fix the
single-threaded loading in MemTable but I'm not sure why the actual query time
is slower for mem tables than for Parquet.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]