save-buffer opened a new pull request #12537:
URL: https://github.com/apache/arrow/pull/12537


   This PR contains an implementation of a multithreaded TPC-H dbgen, as well 
as an implementation of Q1 as a google benchmark. The advantage of this dbgen 
approach is that it is a scan node: it generates data on the fly and streams it 
over. As a result, I was for instance able to run scale factor 1000 on Q1 on my 
desktop with only 32 GB of RAM. 
   
   I did verify results of Q1. They don't exactly match the reference results, 
but they are quite close and well within what I'd expect the variance to be 
between random number generators. 
   ```
   -------------------------------------------------------------
   Benchmark                   Time             CPU   Iterations
   -------------------------------------------------------------
   BM_Tpch_Q1/SF:1     186609936 ns       268825 ns          100
   BM_Tpch_Q1/SF:10   1858114140 ns       276741 ns           10
   BM_Tpch_Q1/SF:100  18561088470 ns       273067 ns            1
   BM_Tpch_Q1/SF:1000 186103719755 ns       289445 ns            1
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to