Hi Danny,

Yes, I agree that we don't need two variations of hash joins
(EnumerableHashJoin and EnumerableThetaHashJoin) as it is the code right
now.

Having that said, I am not sure if we can really handle the general case of
theta joins with hash-based algorithms.

The approach in [3] does not really handle general theta joins but allows
to split a theta condition into equi and non-equi parts where the non-equi
part is executed as a post-join filter.
Since the post-join filter is part of the join operator we can say that we
have a theta hash join operator but this is partially true.

For instance, the following query:

SELECT e1.name
FROM emp e1
INNER JOIN emp e2
  ON e1.salary < e2.salary

cannot use at all hash join algorithm [3] because there is no equality
condition.

I wouldn't mind keeping EnumerableHashJoin only for equijoins and execute
the post-join filter as such but this would make outer joins
difficult/impossible to handle so I believe [3] is a good step forward.

Best,
Stamatis





On Wed, Apr 24, 2019 at 6:28 AM Yuzhao Chen <[email protected]> wrote:

> Julian,
>
> I also agree with you that we should distinguish between
> EnumerableHashJoin and EnumerableMergeJoin, what i want to address about is
> if we should extend the EnumerableHashJoin and EnumerableMergeJoin to let
> them support Thera join conditions except the EquiJoin conditions.
>
> It seems that Stamatis also agree that we can extend the existing
> EnumerableXXXs, correct me if i have wrong understanding.
>
>
> Best,
> Danny Chan
> 在 2019年4月24日 +0800 AM2:18,Julian Hyde <[email protected]>,写道:
> > I agree with Stamatis.
> >
> > Let’s suppose we have a merge join and hash join in enumerable
> convention. The merge join requires its input to be sorted, the hash join
> does not. They can both perform equi-join. The merge join can also perform
> joins like
> >
> > FROM orders
> > JOIN shipments
> > ON shipments.shipDate BETWEEN orders.orderDate
> > AND orders.orderDate + INTERVAL ‘7’ DAY
> >
> > Should we use the same sub-class of RelNode, EnumerableJoin, for both,
> or should we create sub-classes EnumerableHashJoin and EnumerableMergeJoin?
> >
> > I think we should do the latter:
> > * They are clearly different algorithms, and we generate different code
> for them
> > * They have different cost models
> > * They have different input and output traits (the merge join requires
> sorted inputs and produces a sorted output)
> > * If we have an equi-join, we might want to generate both options and
> see how they compare (the merge join might win if one or both of the inputs
> are sorted), and both algorithms used the EnumerableJoin class we would not
> be able to distinguish the RelNode instances.
> >
> > Julian
> >
> >
> > > On Apr 23, 2019, at 11:06 AM, Stamatis Zampetakis <[email protected]>
> wrote:
> > >
> > > In the literature, there are many algorithms to perform a join and the
> vast
> > > majority of them cannot handle theta conditions (either for performance
> > > reasons, or because the algorithm was simply not meant for it).
> > >
> > > For the existing EnumerableXXXJoin variants, it may be possible to
> extend
> > > the algorithm to handle theta joins without structural changes to the
> > > algorithm (as I think it is the case in [3]).
> > > For these cases, I agree with you it does not make sense to have many
> > > variants (e.g., EnumerableThetaHashJoin seems redundant).
> > >
> > > In the future we may decide to use other hash-based implementations
> with
> > > certain applicability restrictions (such as apply only on equijoins)
> but
> > > let's not worry too much about that now.
> > >
> > > Regarding the rules, I think it is useful to have different variants.
> > > Consider for example a system that does not have join processing
> algorithms
> > > that can treat theta-joins; pushing non-equi conditions in a join seems
> > > undesirable in this case.
> > >
> > > For more information regarding join processing, there are a few
> interesting
> > > (and old) surveys [4, 5] which outline some of the most typical
> algorithms
> > > that are used in relational databases.
> > >
> > > Best,
> > > Stamatis
> > >
> > > [4] Query evaluation techniques for Large databases (
> > > https://web.stanford.edu/class/cs346/2014/graefe.pdf)
> > > [5] Join processing in relational databases (
> > > https://www.csd.uoc.gr/~hy460/pdf/p63-mishra.pdf)
> > >
> > >
> > > On Tue, Apr 23, 2019 at 8:05 AM Yuzhao Chen <[email protected]>
> wrote:
> > >
> > > > Thx, Julian
> > > >
> > > > Why not just support non-equi join condition for every physical
> algorithm,
> > > > it does not make much sense if we have both HashJoin and a
> HashTheraJoin,
> > > > cause a HashThataJoin with empty non-equi join condition is same as a
> > > > HashJoin.
> > > >
> > > > And we can remove the limitations in the rule like FilterJoinRule.
> > > >
> > > > Best,
> > > >
> > > > Danny Chan
> > > > 在 2019年4月23日 +0800 AM3:21,[email protected],写道:
> > > > >
> > > > > If there are limitations, over time we would like to remove those
> > > > limitations, but we will probably do it by adding new algorithms, and
> > > > therefore new EnumerableXxx classes.
> > > >
> >
>

Reply via email to