On 2017-12-15 10:57, Riccardo Tommasini <[email protected]> wrote: > A final remark on efficiency. All the mentioned features are very interesting > but not computationally nice. Idk whatâs calcite position on this, but in a > big data community i think we should be careful. SPARQL did some crazy stuff > and is still paying them (see gutierrez papers)
I don't worry too much about efficiency per se, but I focus on making the algebra clean enough that we can recognize the simple cases. I would make the algebra simple and moderately powerful at first, and make it more expressive/powerful later only if we have use cases that need it. It's analogous to joins. Calcite's join operator can express theta-joins, left/right/full outer, semi-joins, and joins over sorted/bucketed data sets, but we can easily recognize an inner equi-join when we see one. Most of the transformation rules are written on inner equi-joins first, and generalized to other kinds of joins when we're sure it's safe. I imagine us taking a similar path with iteration, e.g. when is it safe to push a filter into, and through, an iteration? Lastly, because it's algebra we take a 'RISC' approach. The "Iterate" operator doesn't have to do everything; we can have other operators such as Aggregate, Filter, Project before, after and inside it.
