milenkovicm commented on PR #1333: URL: https://github.com/apache/datafusion-ballista/pull/1333#issuecomment-3444749035
First of all thanks for contribution @mach-kernel! I'll try to support you as much as I can with your requirement. I'll start with a bit of history, it will help me to explain core direction. Last year we have started trimming down ballista to the shape it has today. Previously ballista was a one size fit all solution, it had a lot of code which was siting in the repo to support various very specific use cases. Instead of fit all solution we have decided to make it more generic, with ability to override or add required code to support specific cases which may not be seen as generic for which we can allocate effort to maintain it. This helps to reduce burden on maintainers. If there is use case specific behaviour needed, users can change and compile its own client, scheduler or/and executor. Main reason was, as you state it in discord discussion, we're unable just to drop a jar on the class path. This way user can rely on functionality provided by the core ballista library but extend it in a way to support its own use case. There are few examples of extensions of core functionalities, but I would say it's not documented as much as it needs to be. I have created few more show case projects [ballista python](https://github.com/milenkovicm/ballista_python), [ballista extensions](https://github.com/milenkovicm/ballista_extensions) & [ballista delta](https://github.com/milenkovicm/ballista_delta) to demonstrate how to extend ballista to fit specific use case, I'm not sure if they will help. You have mentioned UDF, at the moment there are few different approaches, not of them perfect, and from perspective of ballista, we hope we can support all of them, but we do not want to maintain them. Regarding your code, I will have a better look, but at the moment most of the things look like they can be implemented out of the core library. you could create your own extensions codecs to support your specific tables. Maybe the missing part which could be added is registering additional (GRPC) service(es) in addition to core scheduler service, which could support centralised schema location. Let me know what you think, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
