wesm commented on pull request #9280:
URL: https://github.com/apache/arrow/pull/9280#issuecomment-887924786
Some updated performance (gcc 9.3 locally on x86):
```
-------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
UserCounters...
-------------------------------------------------------------------------------------
BM_ExecBatchIterator/256 11314787 ns 11313272 ns 62
items_per_second=88.3918/s
BM_ExecBatchIterator/512 5670423 ns 5669598 ns 123
items_per_second=176.379/s
BM_ExecBatchIterator/1024 2903937 ns 2903272 ns 242
items_per_second=344.439/s
BM_ExecBatchIterator/2048 1461982 ns 1461711 ns 481
items_per_second=684.13/s
BM_ExecBatchIterator/4096 739382 ns 739235 ns 951
items_per_second=1.35275k/s
BM_ExecBatchIterator/8192 370264 ns 370207 ns 1892
items_per_second=2.70119k/s
BM_ExecBatchIterator/16384 186622 ns 186573 ns 3755
items_per_second=5.35983k/s
BM_ExecBatchIterator/32768 93581 ns 93567 ns 7437
items_per_second=10.6876k/s
```
The way to read this is that breaking `ExecBatch` with 32 primitive array
fields into smaller ExecBatches (and then accessing a a data pointer in each
batch) has an overhead of approximately:
* 2800 nanoseconds per batch
* 88.6 nanoseconds per batch per field
So if you wanted to break a batch with 1M elements into batches of size 1024
for finer-grained parallel processing, you would pay 2900 microseconds to do
so. On this same machine, I have:
```
In [2]: arr = np.random.randn(1 << 20)
In [3]: timeit arr * 2
395 µs ± 8.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
This seems problematic if we wish to enable array expression evaluation on
smaller batch sizes to keep more data in CPU caches. I'll bring this up on the
mailing list to see what people think.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]