mbutrovich commented on PR #4309: URL: https://github.com/apache/datafusion-comet/pull/4309#issuecomment-4522023467
> I guess the JNI call to `getCredentialsForPath` runs through the Comet tokio runtime, which is sized at `spark.executor.cores` worker threads. The call seems to be synchronous. Since the call duration is non-deterministic and entirely controlled by the vendor's implementation, this can potentially block unrelated work on the runtime today and as the system grows. Do you think this is an issue? I think there are still opportunities to figure out how to get better parallelism and hide I/O latency in Comet's execution model, but yeah right now it's fairly restricted. I think at least for the OpenDAL/Iceberg case we have a knob you can tune to fire off more tasks for data loading, which I think would introduce parallelism on this path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
