Re: Question about collation and distribution trait

Maryann Xue Fri, 30 Oct 2015 08:16:39 -0700

Thanks a lot, Jacques, for the answer!

Julian and James, I made a mistake when bringing up this topic yesterday at
our sync up meeting. In standalone Phoenix+Calcite, it should not be a
problem since the parallel scan of HBase regions will be taken care of by
Phoenix's ScanPlan, which will do a merge-sort if it sees that the table is
salted. The reason why I hit a problem in the tests was that I ignored that
the option "phoenix.query.force.rowkeyorder" was set to false by default.
We should set it as true in Phoenix+Calcite, to guarantee that our runtime
implementation is consistent with the table's collation trait.


But it is a thing worth looking at in Drillix (Drill+Phoenix), since the
parallel scan and merge is done in Drill. I think Jacques's statement is
generally true here, but for some reason I did notice there was a "Sort" on
top of the Drill+Phoenix rel for a select star without order-by. Anything
might be suspicious here?


Thanks,
Maryann

On Thu, Oct 29, 2015 at 4:42 PM, Jacques Nadeau <[email protected]> wrote:

> On the first point, in Drill we treat this as distributed and collated on
> primary key. This doesn't cause problems because exchanges are used to
> redistribute data (or get it to the client node). Each exchange will
> maintain or not the specific traits.
>
> On Thu, Oct 29, 2015 at 9:02 AM, Maryann Xue <[email protected]>
> wrote:
>
> > Hi,
> >
> > I have two questions regarding the Phoenix + Calcite integration:
> >
> > 1)
> > Phoenix has salted tables which add a hashed value "header" to the
> > beginning of the rowkey. Thus salted tables are hash partitioned but
> > maintains primary key order within each partition.
> > So question is how should we describe the collation and distribution
> trait
> > of salted tables? I assume distribution is just HASH_DISTRIBUTED, but is
> > the collation of sorted on PK (just the same as regular tables) enough
> > here?
> >
> > 2)
> > Phoenix has a implementation of secondary index called local index, which
> > means each partition (region) of index table is always co-located with
> the
> > corresponding partition (region) of its parent table.
> > Is there a way that we could describe this co-location relationship? I
> > think it might be useful if we should have a "local join" operator in
> > future.
> >
> >
> > Thanks,
> > Maryann
> >
>

Re: Question about collation and distribution trait

Reply via email to