mbutrovich commented on PR #4335: URL: https://github.com/apache/datafusion-comet/pull/4335#issuecomment-4462040690
Thanks for putting this together @karuppayya! We're definitely thinking about the same problem right now. :) One thing I'd like to push on before we go further. A fair amount of this PR is distribution and lifecycle wiring (the per-catalog broadcast cache, threading the broadcast through `CometExecRDD` and the operator, the `IcebergReflection.getCatalogName` lookup that keys the cache, and the `initialize(catalogProperties)` step that exists so the broadcast carries config). That feels like vendor territory to me rather than Comet territory. A vendor credential manager can already integrate with Spark's own credential distribution mechanisms, run its own broadcast, or stash state in a lazy singleton, none of which Comet has to know about. If the SPI is `ServiceLoader`-discovered, every executor finds the provider on its own classpath and the broadcast layer goes away. My concern is scope. Comet's value is accelerating compute-intensive operations in native code, and I'd rather we not take on responsibility for reimplementing Spark's credential plumbing inside Comet when vendors are already equipped to handle it. #4309 takes the narrower position of owning only the JNI shape and the per-call context, and leaves caching, refresh, and distribution to the vendor implementation. Is there a scenario you have in mind that #4309's shape cannot cover? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
