Hi Cheng Su and All, Thanks for your reply; the change I'm attempting to make would be a significant philosophical change to how optimizers currently handle cardinality estimation. With that in mind, I think it might be wiser to first perform a prototype/proof of concept as versus the traditional pull request and review workflow.
For some more context on my method: the central idea of my work is to lean heavily towards overestimation <https://dl.acm.org/doi/10.1145/3299869.3319894> during the cardinality estimation process using the elegant entropic bounding <https://arxiv.org/pdf/1612.02503.pdf> framework. Particularly for multi-join queries this avoids the underestimation problem that pervades modern systems. So far my work has focused on single node DBs and scaling to multi-node systems presents new hurdles; hence why I'm here. Thanks, Walter On Wed, Apr 21, 2021 at 11:46 PM Cheng Su <chen...@fb.com> wrote: > Hello Walter, > > > > Just FYI - https://spark.apache.org/contributing.html is the general > guide for how to contributing in Spark. > > > > > implement a prototype modification to spark's optimizer to > exhibit/experiment some of my PhD work > > > > Maybe you could share some links or pointers for the work you have done? > So this can help give people some basic ideas and provide help more > specifically. > > > > Thanks, > > Cheng Su > > > > *From: *Walter Cai <wal...@cs.washington.edu> > *Date: *Wednesday, April 21, 2021 at 6:09 PM > *To: *"dev@spark.apache.org" <dev@spark.apache.org> > *Subject: *modifying spark's optimizer for research > > > > Hi, > > > > I'm Walter, a PhD student at the University of Washington. My goal is to > implement a prototype modification to spark's optimizer to > exhibit/experiment some of my PhD work. I was hoping to set up a chat with > somebody who is familiar with catalyst and the best place to start > modifying. > > > > Thanks, > > Walter > > wal...@cs.washington.edu >