Re: Question about collation and distribution trait

James Taylor Fri, 30 Oct 2015 09:19:11 -0700

Thanks for the info, Maryann. If we can avoid forcing row key order (when
there's no explicit ordering), we'll get much better performance for non
aggregate queries (7.5x last we measured). The  phoenix.query.force.rowkeyorder
is more for backward compatibility - for users of pre 4.4 releases who were
depending on row key order even in the absence of an order by clause.


Thanks,
James

On Fri, Oct 30, 2015 at 8:15 AM, Maryann Xue <[email protected]> wrote:

> Thanks a lot, Jacques, for the answer!
>
> Julian and James, I made a mistake when bringing up this topic yesterday at
> our sync up meeting. In standalone Phoenix+Calcite, it should not be a
> problem since the parallel scan of HBase regions will be taken care of by
> Phoenix's ScanPlan, which will do a merge-sort if it sees that the table is
> salted. The reason why I hit a problem in the tests was that I ignored that
> the option "phoenix.query.force.rowkeyorder" was set to false by default.
> We should set it as true in Phoenix+Calcite, to guarantee that our runtime
> implementation is consistent with the table's collation trait.
>
> But it is a thing worth looking at in Drillix (Drill+Phoenix), since the
> parallel scan and merge is done in Drill. I think Jacques's statement is
> generally true here, but for some reason I did notice there was a "Sort" on
> top of the Drill+Phoenix rel for a select star without order-by. Anything
> might be suspicious here?
>
>
> Thanks,
> Maryann
>
> On Thu, Oct 29, 2015 at 4:42 PM, Jacques Nadeau <[email protected]>
> wrote:
>
> > On the first point, in Drill we treat this as distributed and collated on
> > primary key. This doesn't cause problems because exchanges are used to
> > redistribute data (or get it to the client node). Each exchange will
> > maintain or not the specific traits.
> >
> > On Thu, Oct 29, 2015 at 9:02 AM, Maryann Xue <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > I have two questions regarding the Phoenix + Calcite integration:
> > >
> > > 1)
> > > Phoenix has salted tables which add a hashed value "header" to the
> > > beginning of the rowkey. Thus salted tables are hash partitioned but
> > > maintains primary key order within each partition.
> > > So question is how should we describe the collation and distribution
> > trait
> > > of salted tables? I assume distribution is just HASH_DISTRIBUTED, but
> is
> > > the collation of sorted on PK (just the same as regular tables) enough
> > > here?
> > >
> > > 2)
> > > Phoenix has a implementation of secondary index called local index,
> which
> > > means each partition (region) of index table is always co-located with
> > the
> > > corresponding partition (region) of its parent table.
> > > Is there a way that we could describe this co-location relationship? I
> > > think it might be useful if we should have a "local join" operator in
> > > future.
> > >
> > >
> > > Thanks,
> > > Maryann
> > >
> >
>

Re: Question about collation and distribution trait

Reply via email to