Hi folks, These days are the deadline (19 of April) for the Google Summer of Code(GSoC)[1], and we want to apply two ideas that the students could implement inside of Apache Wayang (Incubating). It will help them to learn the internals of Wayang and also learn about the cost model; the ideas are:
- the first is the paper [Expand your Training Limits! Generating Training Data for ML-based Data Management]( https://www.agora-ecosystem.com/publications_pdf/expand_training_limits.pdf) where the authors try to generate data for training an ml that will provide the cost model; this tries to help with the generation of data to train the cost model of the current model, and this will help to more people tuning them model. - the second idea comes from [Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction](https://arxiv.org/pdf/2201.00561.pdf), where the idea is to create a model pre-trained, but it learns during the new queries are coming, this could help people that can wait for having a training model and also help to build a model that not need to be calibrated. If you have another idea, also we can add it :D, the deadline Best regards, Bertty [1] https://summerofcode.withgoogle.com
