milenkovicm commented on PR #1333:
URL: 
https://github.com/apache/datafusion-ballista/pull/1333#issuecomment-3444749035

   First of all thanks for contribution @mach-kernel! I'll try to support you 
as much as I can with your requirement. 
   
   I'll start with a bit of history, it will help me to explain core direction. 
Last year we have started trimming down ballista to the shape it has today. 
Previously ballista was a one size fit all solution, it had a lot of code which 
was siting in the repo to support various very specific use cases. Instead of 
fit all solution we have decided to make it more generic, with ability to 
override or add required code to support specific cases which may not be seen 
as generic for which we can allocate effort to maintain it. This helps  to 
reduce burden on maintainers.
   
   If there is use case specific behaviour needed, users can change and compile 
its own client, scheduler or/and executor. Main reason was, as you state it in 
discord discussion, we're unable just to drop a jar on the class path. This way 
user can rely on functionality provided by the core ballista library but extend 
it in a way to support its own use case. 
   
   There are few examples of extensions of core functionalities, but I would 
say it's not documented as much as it needs to be. I have created few more show 
case projects [ballista 
python](https://github.com/milenkovicm/ballista_python), [ballista 
extensions](https://github.com/milenkovicm/ballista_extensions) & [ballista 
delta](https://github.com/milenkovicm/ballista_delta) to demonstrate how to 
extend ballista to fit specific use case, I'm not sure if they will help. 
   
   You have mentioned UDF, at the moment there are few different approaches, 
not of them perfect, and from perspective of ballista, we hope we can support 
all of them, but we do not want to maintain them. 
   
   Regarding your code, I will have a better look, but at the moment most of 
the things look like they can be implemented out of the core library. you could 
create your own extensions codecs to support your specific tables. Maybe the 
missing part which could be added is registering additional (GRPC) service(es) 
in addition to core scheduler service, which could support centralised schema 
location. 
   
   Let me know what you think, 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to