zagto commented on PR #13179:
URL: https://github.com/apache/arrow/pull/13179#issuecomment-1149122524

   > Changing both would be more realistic if it isn't too much trouble. I 
wouldn't expect it to show too much new information but I've long since learned 
to assume I know anything when it comes to this kind of performance laughing
   
   I now changed them both now.
   I noticed another interesting effect here. Intuitively, it would make sense 
to also use separate output/intermediate buffers for each batch, since 
especially simple_expression benefits a lot from just writing to the same 
buffer each time. Turns out arrow already benefits from the same effect, 
probably due to the allocator giving it back the same memory each time. Without 
this effect, simple_expression does not benefit from small batch sizes at all:
   ```
   
ExecuteScalarExpressionBaseline<SimpleExpressionBaseline>/rows_per_batch:1000/real_time/threads:1
          401312 ns       400307 ns         1735 batches_per_second=2.49183M/s 
rows_per_second=2.49183G/s
   
ExecuteScalarExpressionBaseline<SimpleExpressionBaseline>/rows_per_batch:1000/real_time/threads:16
        1065807 ns     16879539 ns          656 batches_per_second=938.256k/s 
rows_per_second=938.256M/s
   
ExecuteScalarExpressionBaseline<SimpleExpressionBaseline>/rows_per_batch:10000/real_time/threads:1
         397138 ns       396141 ns         1752 batches_per_second=251.802k/s 
rows_per_second=2.51802G/s
   
ExecuteScalarExpressionBaseline<SimpleExpressionBaseline>/rows_per_batch:10000/real_time/threads:16
       1056364 ns     16569094 ns          640 batches_per_second=94.6644k/s 
rows_per_second=946.644M/s
   
ExecuteScalarExpressionBaseline<SimpleExpressionBaseline>/rows_per_batch:100000/real_time/threads:1
        380833 ns       380450 ns         1832 batches_per_second=26.2583k/s 
rows_per_second=2.62583G/s
   
ExecuteScalarExpressionBaseline<SimpleExpressionBaseline>/rows_per_batch:100000/real_time/threads:16
      1061477 ns     16735682 ns          640 batches_per_second=9.42083k/s 
rows_per_second=942.083M/s
   
ExecuteScalarExpressionBaseline<SimpleExpressionBaseline>/rows_per_batch:1000000/real_time/threads:1
       369300 ns       368976 ns         1882 batches_per_second=2.70783k/s 
rows_per_second=2.70783G/s
   
ExecuteScalarExpressionBaseline<SimpleExpressionBaseline>/rows_per_batch:1000000/real_time/threads:16
     1075881 ns     16966905 ns          624 batches_per_second=929.47/s 
rows_per_second=929.47M/s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to