robtandy commented on PR #60: URL: https://github.com/apache/datafusion-ray/pull/60#issuecomment-2651537054
I've refactored the control of parallelizing execution to be more fine grained. `--partitions-per-worker` controls the number of partitions hosted by an actor (who gets an entire Ray Worker). So if the stage has 10 partitions, `concurrency=10` and `partitions-per-worker=4`, we'll spin up 40 actors to satisfy the query. Latest TPCH100 results, compared with local data fusion using all cores on a 32CPU machine, NVME drive: ```json { "engine": "datafusion-ray", "benchmark": "tpch", "settings": { "concurrency": 16, "batch_size": 8192, "prefetch_buffer_size": 0, "partitions_per_worker": 4 }, "data_path": "file:///data2/sf100/", "queries": { "1": 14.63571572303772, "2": 16.11984419822693, "3": 20.260254621505737, "4": 16.40132737159729, "5": 35.4002046585083, "6": 7.793532609939575, "7": 49.56708884239197, "8": 37.00137710571289, "9": 60.13660907745361, "10": 38.18756365776062, "11": 13.499444484710693, "12": 18.93906331062317, "13": 14.921503782272339, "14": 7.416260004043579, "15": 2.373532295227051, "16": 8.229618549346924, "17": 52.57597255706787, "18": 85.48271942138672, "19": 10.138697862625122, "20": 15.182426929473877, "21": 78.81208372116089, "22": 8.711960792541504 }, "local_queries": { "1": 14.912381172180176, "2": 10.478784322738647, "3": 8.960041284561157, "4": 3.8824241161346436, "5": 15.605360507965088, "6": 1.672469139099121, "7": 28.076196432113647, "8": 14.546991348266602, "9": 26.64270520210266, "10": 11.699812173843384, "11": 4.682126522064209, "12": 4.03217339515686, "13": 8.454285621643066, "14": 2.875070095062256, "15": 0.0013363361358642578, "16": 2.2461774349212646, "17": 26.58483576774597, "18": 61.40281629562378, "19": 5.444426774978638, "20": 7.112048625946045, "21": 36.257577657699585, "22": 2.517507314682007 }, "validated": { "1": true, "2": true, "3": true, "4": true, "5": true, "6": true, "7": true, "8": true, "9": true, "10": true, "11": true, "12": true, "13": true, "14": true, "15": true, "16": true, "17": true, "18": true, "19": true, "20": true, "21": true, "22": true } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org