[ 
https://issues.apache.org/jira/browse/COMDEV-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bertty Contreras updated COMDEV-473:
------------------------------------
    Labels: gsoc gsoc2022 machine_learning mentor  (was: gsoc gsoc2022 
machine_learning)

> Apache Wayang(Incubating): Cost Model Learner Using Machine learning
> --------------------------------------------------------------------
>
>                 Key: COMDEV-473
>                 URL: https://issues.apache.org/jira/browse/COMDEV-473
>             Project: Community Development
>          Issue Type: New Feature
>          Components: GSoC/Mentoring ideas
>            Reporter: Bertty Contreras
>            Priority: Critical
>              Labels: gsoc, gsoc2022, machine_learning, mentor
>   Original Estimate: 350h
>  Remaining Estimate: 350h
>
> *Synopsis*
> The current Apache Wayang (Incubating) uses a cost model to select the right 
> set of platforms while optimize query plans. Often, the initial cost model 
> could be ineffective after some time, and a calibration of the cost model is 
> required again. The goal is to create a pipeline that starts a ML pipeline 
> that starts the calibration of the cost model automatically and uses the logs 
> of the previous query executions to get refine the cost model so that it 
> follows the workload that interacts with the Apache Wayang (Incubating) 
> environment.
>  
> *Benefits to Community*
> The benefits for the community will have an AI pipeline for automatic, 
> dynamic cost model calibration in query optimizers; We will use Apache 
> Wayang(Incubating) as our playground. As a result, the experience of the 
> users of Apache Wayang(Incubating) will improve by helping them to 
> automatically tune their cost models and adatp to the current query workload.
>  
> *Deliverables*
> The delivery expected is an adaptation for the paper "Zero-Shot Cost Models 
> for Out-of-the-box Learned Cost Prediction"[1], where the authors assume an 
> ML-Cost-Model. Still, in this case, the idea needs modifications to run in 
> the current setup of Apache Wayang(Incubating).
>  
> The step expected are the following:
>  * Understand the paper [1]
>  * Get into the cost model of Apache Wayang
>  * Discuss and design the process for the dynamic cost-model
>  * Implement the feature of dynamic cost-model
>  
> *Related Work*
> [1] [Zero-Shot Cost Models for Out-of-the-box Learned Cost 
> Prediction]([https://arxiv.org/pdf/2201.00561.pdf])
> [2] [RHEEMix in the data jungle: a cost-based optimizer for cross-platform 
> systems]([https://wayang.apache.org/assets/pdf/paper/journal_vldb.pdf])
>  
> {*}Biographical Information of possible mentor{*}{*}{{*}}
> Bertty Contreras-Rojas is a Senior Software Engineer at Databloom Inc. He is 
> one of the PPMC of Apache Wayang(Incubating). He has many years of experience 
> developing intensive processing data systems for several industries, such as 
> banking systems. He was a research engineer at the Qatar Computing Research 
> Institute, where he was responsible for developing the declarative query 
> engine for Rheem and adding new underlying platforms to Rheem.
>  
> Rodrigo Pardo-Meza is a Senior Software Engineer at Databloom Inc. He is one 
> of the PPMC of Apache Wayang(Incubating). He has many years of experience 
> developing applications that support Big Data processing, with experience 
> implementing ETL processes over distributed systems to optimize inventories 
> in supply chains. He was a research engineer at the Qatar Computing Research 
> Institute, where he specialized in human interface interaction with big data 
> analytics. During this time, he co-develop an ML-based cross-platform query 
> optimizer.
>  
> Jorge Quiané is the head of the Big Data Systems research group at the Berlin 
> Institute for the Foundations of Learning and Data (BIFOLD) and a Principal 
> Researcher at DIMA (TU Berlin). He also acts as the Scientific Coordinator of 
> the IAM group at the German Research Center for ArtificialIntelligence 
> (DFKI). His current research is in the broad area of big data: mainly in 
> federated data analytics, scalable data infrastructures, and distributed 
> query processing. He has published numerous research papers on data 
> management and novel system architectures. He has recently been honoured with 
> the 2022 ACM SIGMOD Research Highlight Award and the Best Paper Award at ICDE 
> 2021 for his work on “EfficientControl Flow in Dataflow Systems”. He holds 
> five patents in core database areas and on machine learning. Earlier in his 
> career, he was a Senior Scientist at the Qatar Computing Research Institute 
> (QCRI) and a Postdoctoral Researcher at Saarland University. He obtained his 
> PhD in computer science from INRIA (Nantes University).
>  
> *Name and Contact Information*
> Name: Bertty Contreras-Rojas
> email: bertty (at) apache.org
> community: dev (at) wayang.apache.org
> website: [https://wayang.apache.org|https://wayang.apache.org/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org

Reply via email to