robtandy commented on PR #60:
URL: https://github.com/apache/datafusion-ray/pull/60#issuecomment-2651537054

   I've refactored the control of parallelizing execution to be more fine 
grained.   `--partitions-per-worker` controls the number of partitions hosted 
by an actor (who gets an entire Ray Worker).   So if the stage has 10 
partitions, `concurrency=10` and `partitions-per-worker=4`, we'll spin up 40 
actors to satisfy the query.
   
   Latest TPCH100 results, compared with local data fusion using all cores on a 
32CPU machine, NVME drive:
   
   ```json
   {
       "engine": "datafusion-ray",
       "benchmark": "tpch",
       "settings": {
           "concurrency": 16,
           "batch_size": 8192,
           "prefetch_buffer_size": 0,
           "partitions_per_worker": 4
       },
       "data_path": "file:///data2/sf100/",
       "queries": {
           "1": 14.63571572303772,
           "2": 16.11984419822693,
           "3": 20.260254621505737,
           "4": 16.40132737159729,
           "5": 35.4002046585083,
           "6": 7.793532609939575,
           "7": 49.56708884239197,
           "8": 37.00137710571289,
           "9": 60.13660907745361,
           "10": 38.18756365776062,
           "11": 13.499444484710693,
           "12": 18.93906331062317,
           "13": 14.921503782272339,
           "14": 7.416260004043579,
           "15": 2.373532295227051,
           "16": 8.229618549346924,
           "17": 52.57597255706787,
           "18": 85.48271942138672,
           "19": 10.138697862625122,
           "20": 15.182426929473877,
           "21": 78.81208372116089,
           "22": 8.711960792541504
       },
       "local_queries": {
           "1": 14.912381172180176,
           "2": 10.478784322738647,
           "3": 8.960041284561157,
           "4": 3.8824241161346436,
           "5": 15.605360507965088,
           "6": 1.672469139099121,
           "7": 28.076196432113647,
           "8": 14.546991348266602,
           "9": 26.64270520210266,
           "10": 11.699812173843384,
           "11": 4.682126522064209,
           "12": 4.03217339515686,
           "13": 8.454285621643066,
           "14": 2.875070095062256,
           "15": 0.0013363361358642578,
           "16": 2.2461774349212646,
           "17": 26.58483576774597,
           "18": 61.40281629562378,
           "19": 5.444426774978638,
           "20": 7.112048625946045,
           "21": 36.257577657699585,
           "22": 2.517507314682007
       },
       "validated": {
           "1": true,
           "2": true,
           "3": true,
           "4": true,
           "5": true,
           "6": true,
           "7": true,
           "8": true,
           "9": true,
           "10": true,
           "11": true,
           "12": true,
           "13": true,
           "14": true,
           "15": true,
           "16": true,
           "17": true,
           "18": true,
           "19": true,
           "20": true,
           "21": true,
           "22": true
       }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to