[I] exceeded protobuf size of 2gb [datafusion-ray]

via GitHub Mon, 02 Dec 2024 14:05:09 -0800


dlr2 opened a new issue, #51:
URL: https://github.com/apache/datafusion-ray/issues/51


   I'm running some tests on SQL queries that use WHERE, GROUP BY and/or ORDER 
BY phrases. That is no JOINS atm. I was able to create a cluster manually, with 
six nodes that each have 2 GPUS plus a head node. This shows up in the UI as I 
would expect; I also have grafana and prometheus running with it. I am using a 
python client. But some issues:
   
   - In some cases, I get the error: `ray.rpc.RequestWorkerLeaseRequest 
exceeded maximum protobuf size of 2GB"` (I think its looking for 16GB in one 
case)
   - To add, from what I can see all of my queries are processed in one stage. 
And the output shows "Forcing reduce stage concurrency from 1200 to 1" then all 
compute takes place on the first node of the cluster. The 1200 is about the 
number of cores I have across the cluster and the value I used to initialize 
`DatafusionRayContext`. Might be that the query plan deduced doesn't allow for 
parallelism?
       - BTW, it seems odd that after `ray.init` which connects to the cluster 
I then need `DatafusionRayContext(int)` to define how many CPUs I want. I would 
think that the connection object could be passed like 
`DatafusionRayContext(InitializationObject)` and the resulting context would 
then use all of the cluster. Otherwise, I don't see how to get configs from the 
`ray.init` to then set the number of CPUs dynamically -- and there is no option 
for GPUs from connection info -- but maybe datafusion-ray doesn't use GPUs
   - In some cases, where I can simply run datafusion directly, the 
datafusion-ray version (also running on one node per above) can be up to 8x 
slower. 
   
   Unfortunately, I can't provide examples or datasets as my tests are 
internal. 
   
   Thanks!
   
   PS: I hope this is the correct place for this type of feedback. I didn't see 
a forum or anything. And apologies for noting more than one issue in the same 
post. I realize that you are only at version 0.1.0 so I hope you can view this 
as helpful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] exceeded protobuf size of 2gb [datafusion-ray]

Reply via email to