Hi Cheng Su and All,

Thanks for your reply; the change I'm attempting to make would be a
significant philosophical change to how optimizers currently handle
cardinality estimation. With that in mind, I think it might be wiser to
first perform a prototype/proof of concept as versus the traditional pull
request and review workflow.

For some more context on my method: the central idea of my work is to lean
heavily towards overestimation
<https://dl.acm.org/doi/10.1145/3299869.3319894> during the cardinality
estimation process using the elegant entropic bounding
<https://arxiv.org/pdf/1612.02503.pdf> framework. Particularly for
multi-join queries this avoids the underestimation problem that pervades
modern systems. So far my work has focused on single node DBs and scaling
to multi-node systems presents new hurdles; hence why I'm here.

Thanks,
Walter

On Wed, Apr 21, 2021 at 11:46 PM Cheng Su <chen...@fb.com> wrote:

> Hello Walter,
>
>
>
> Just FYI - https://spark.apache.org/contributing.html is the general
> guide for how to contributing in Spark.
>
>
>
> > implement a prototype modification to spark's optimizer to
> exhibit/experiment some of my PhD work
>
>
>
> Maybe you could share some links or pointers for the work you have done?
> So this can help give people some basic ideas and provide help more
> specifically.
>
>
>
> Thanks,
>
> Cheng Su
>
>
>
> *From: *Walter Cai <wal...@cs.washington.edu>
> *Date: *Wednesday, April 21, 2021 at 6:09 PM
> *To: *"dev@spark.apache.org" <dev@spark.apache.org>
> *Subject: *modifying spark's optimizer for research
>
>
>
> Hi,
>
>
>
> I'm Walter, a PhD student at the University of Washington. My goal is to
> implement a prototype modification to spark's optimizer to
> exhibit/experiment some of my PhD work. I was hoping to set up a chat with
> somebody who is familiar with catalyst and the best place to start
> modifying.
>
>
>
> Thanks,
>
> Walter
>
> wal...@cs.washington.edu
>

Reply via email to