Hello guys, Thank you Bertty for recalling the GSoC.
Overall, although I like the ideas, I do not see how these ideas will integrate into Wayang as we do not have an ML-based optimizer yet. Therefore, those ideas will stay standalone solutions and not really integrated into Wayang. Perhaps, I am missing the main point of the idea behind both proposals, if so please let me know. Therefore, I would suggest the implementation of the "ML-based Cross-Platform Query Optimization" paper published at ICDE 2020: https://ieeexplore.ieee.org/document/9101757 This project would allow us to integrate our ML-based optimizer into Wayang, once this is done, then Betty's proposed ideas make a lot of sense. I would even say that Bertty's first proposal (Implementing our training data generator) could be proposed in the same as the one I propose above. What do you think? Best, Jorge On Thu, Apr 14, 2022 at 1:21 PM Rodrigo Pardo Meza <[email protected]> wrote: > +1 sounds cool, definitely we should implement that ideas soon > > El mié, 13 abr 2022 a las 14:55, Alexander Alten (<[email protected]>) > escribió: > > > +1 from my side :) > > > > > On 13. Apr 2022, at 14:54, Bertty Contreras <[email protected]> > wrote: > > > > > > Hi folks, > > > > > > These days are the deadline (19 of April) for the Google Summer of > > > Code(GSoC)[1], and we want to apply two ideas that the students could > > > implement inside of Apache Wayang (Incubating). It will help them to > > learn > > > the internals of Wayang and also learn about the cost model; the ideas > > are: > > > > > > - the first is the paper [Expand your Training Limits! Generating > > Training > > > Data for ML-based Data Management]( > > > > > > https://www.agora-ecosystem.com/publications_pdf/expand_training_limits.pdf > > ) > > > where the authors try to generate data for training an ml that will > > provide > > > the cost model; this tries to help with the generation of data to train > > the > > > cost model of the current model, and this will help to more people > tuning > > > them model. > > > > > > - the second idea comes from [Zero-Shot Cost Models for Out-of-the-box > > > Learned Cost Prediction](https://arxiv.org/pdf/2201.00561.pdf), where > > the > > > idea is to create a model pre-trained, but it learns during the new > > queries > > > are coming, this could help people that can wait for having a training > > > model and also help to build a model that not need to be calibrated. > > > > > > If you have another idea, also we can add it :D, the deadline > > > > > > Best regards, > > > Bertty > > > > > > [1] https://summerofcode.withgoogle.com > > > > >
