Hello guys,

Thank you Bertty for recalling the GSoC.

Overall, although I like the ideas, I do not see how these ideas will
integrate into Wayang as we do not have an ML-based optimizer yet.
Therefore, those ideas will stay standalone solutions and not really
integrated into Wayang. Perhaps, I am missing the main point of the idea
behind both proposals, if so please let me know.

Therefore, I would suggest the implementation of the "ML-based
Cross-Platform Query Optimization" paper published at ICDE 2020:
https://ieeexplore.ieee.org/document/9101757

This project would allow us to integrate our ML-based optimizer into
Wayang, once this is done, then Betty's proposed ideas make a lot of sense.
I would even say that Bertty's first proposal (Implementing our training
data generator) could be proposed in the same as the one I propose above.

What do you think?

Best,
Jorge








On Thu, Apr 14, 2022 at 1:21 PM Rodrigo Pardo Meza <[email protected]>
wrote:

> +1 sounds cool, definitely we should implement that ideas soon
>
> El mié, 13 abr 2022 a las 14:55, Alexander Alten (<[email protected]>)
> escribió:
>
> > +1 from my side :)
> >
> > > On 13. Apr 2022, at 14:54, Bertty Contreras <[email protected]>
> wrote:
> > >
> > > Hi folks,
> > >
> > > These days are the deadline (19 of April) for the Google Summer of
> > > Code(GSoC)[1], and we want to apply two ideas that the students could
> > > implement inside of Apache Wayang (Incubating). It will help them to
> > learn
> > > the internals of Wayang and also learn about the cost model; the ideas
> > are:
> > >
> > > - the first is the paper [Expand your Training Limits! Generating
> > Training
> > > Data for ML-based Data Management](
> > >
> >
> https://www.agora-ecosystem.com/publications_pdf/expand_training_limits.pdf
> > )
> > > where the authors try to generate data for training an ml that will
> > provide
> > > the cost model; this tries to help with the generation of data to train
> > the
> > > cost model of the current model, and this will help to more people
> tuning
> > > them model.
> > >
> > > - the second idea comes from [Zero-Shot Cost Models for Out-of-the-box
> > > Learned Cost Prediction](https://arxiv.org/pdf/2201.00561.pdf), where
> > the
> > > idea is to create a model pre-trained, but it learns during the new
> > queries
> > > are coming, this could help people that can wait for having a training
> > > model and also help to build a model that not need to be calibrated.
> > >
> > > If you have another idea, also we can add it :D, the deadline
> > >
> > > Best regards,
> > > Bertty
> > >
> > > [1] https://summerofcode.withgoogle.com
> >
> >
>

Reply via email to