[ https://issues.apache.org/jira/browse/COMDEV-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bertty Contreras updated COMDEV-473: ------------------------------------ Labels: gsoc gsoc2022 machine_learning mentor (was: gsoc gsoc2022 machine_learning) > Apache Wayang(Incubating): Cost Model Learner Using Machine learning > -------------------------------------------------------------------- > > Key: COMDEV-473 > URL: https://issues.apache.org/jira/browse/COMDEV-473 > Project: Community Development > Issue Type: New Feature > Components: GSoC/Mentoring ideas > Reporter: Bertty Contreras > Priority: Critical > Labels: gsoc, gsoc2022, machine_learning, mentor > Original Estimate: 350h > Remaining Estimate: 350h > > *Synopsis* > The current Apache Wayang (Incubating) uses a cost model to select the right > set of platforms while optimize query plans. Often, the initial cost model > could be ineffective after some time, and a calibration of the cost model is > required again. The goal is to create a pipeline that starts a ML pipeline > that starts the calibration of the cost model automatically and uses the logs > of the previous query executions to get refine the cost model so that it > follows the workload that interacts with the Apache Wayang (Incubating) > environment. > > *Benefits to Community* > The benefits for the community will have an AI pipeline for automatic, > dynamic cost model calibration in query optimizers; We will use Apache > Wayang(Incubating) as our playground. As a result, the experience of the > users of Apache Wayang(Incubating) will improve by helping them to > automatically tune their cost models and adatp to the current query workload. > > *Deliverables* > The delivery expected is an adaptation for the paper "Zero-Shot Cost Models > for Out-of-the-box Learned Cost Prediction"[1], where the authors assume an > ML-Cost-Model. Still, in this case, the idea needs modifications to run in > the current setup of Apache Wayang(Incubating). > > The step expected are the following: > * Understand the paper [1] > * Get into the cost model of Apache Wayang > * Discuss and design the process for the dynamic cost-model > * Implement the feature of dynamic cost-model > > *Related Work* > [1] [Zero-Shot Cost Models for Out-of-the-box Learned Cost > Prediction]([https://arxiv.org/pdf/2201.00561.pdf]) > [2] [RHEEMix in the data jungle: a cost-based optimizer for cross-platform > systems]([https://wayang.apache.org/assets/pdf/paper/journal_vldb.pdf]) > > {*}Biographical Information of possible mentor{*}{*}{{*}} > Bertty Contreras-Rojas is a Senior Software Engineer at Databloom Inc. He is > one of the PPMC of Apache Wayang(Incubating). He has many years of experience > developing intensive processing data systems for several industries, such as > banking systems. He was a research engineer at the Qatar Computing Research > Institute, where he was responsible for developing the declarative query > engine for Rheem and adding new underlying platforms to Rheem. > > Rodrigo Pardo-Meza is a Senior Software Engineer at Databloom Inc. He is one > of the PPMC of Apache Wayang(Incubating). He has many years of experience > developing applications that support Big Data processing, with experience > implementing ETL processes over distributed systems to optimize inventories > in supply chains. He was a research engineer at the Qatar Computing Research > Institute, where he specialized in human interface interaction with big data > analytics. During this time, he co-develop an ML-based cross-platform query > optimizer. > > Jorge Quiané is the head of the Big Data Systems research group at the Berlin > Institute for the Foundations of Learning and Data (BIFOLD) and a Principal > Researcher at DIMA (TU Berlin). He also acts as the Scientific Coordinator of > the IAM group at the German Research Center for ArtificialIntelligence > (DFKI). His current research is in the broad area of big data: mainly in > federated data analytics, scalable data infrastructures, and distributed > query processing. He has published numerous research papers on data > management and novel system architectures. He has recently been honoured with > the 2022 ACM SIGMOD Research Highlight Award and the Best Paper Award at ICDE > 2021 for his work on “EfficientControl Flow in Dataflow Systems”. He holds > five patents in core database areas and on machine learning. Earlier in his > career, he was a Senior Scientist at the Qatar Computing Research Institute > (QCRI) and a Postdoctoral Researcher at Saarland University. He obtained his > PhD in computer science from INRIA (Nantes University). > > *Name and Contact Information* > Name: Bertty Contreras-Rojas > email: bertty (at) apache.org > community: dev (at) wayang.apache.org > website: [https://wayang.apache.org|https://wayang.apache.org/] -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@community.apache.org For additional commands, e-mail: dev-h...@community.apache.org