Re: Question about collation and distribution trait

Maryann Xue Fri, 30 Oct 2015 09:38:41 -0700

I see, James. Think there should be a way of doing this in Calcite. I'll
figure that out.


On Fri, Oct 30, 2015 at 12:15 PM, James Taylor <[email protected]>
wrote:

> Thanks for the info, Maryann. If we can avoid forcing row key order (when
> there's no explicit ordering), we'll get much better performance for non
> aggregate queries (7.5x last we measured). The
> phoenix.query.force.rowkeyorder
> is more for backward compatibility - for users of pre 4.4 releases who were
> depending on row key order even in the absence of an order by clause.
>
> Thanks,
> James
>
> On Fri, Oct 30, 2015 at 8:15 AM, Maryann Xue <[email protected]>
> wrote:
>
> > Thanks a lot, Jacques, for the answer!
> >
> > Julian and James, I made a mistake when bringing up this topic yesterday
> at
> > our sync up meeting. In standalone Phoenix+Calcite, it should not be a
> > problem since the parallel scan of HBase regions will be taken care of by
> > Phoenix's ScanPlan, which will do a merge-sort if it sees that the table
> is
> > salted. The reason why I hit a problem in the tests was that I ignored
> that
> > the option "phoenix.query.force.rowkeyorder" was set to false by default.
> > We should set it as true in Phoenix+Calcite, to guarantee that our
> runtime
> > implementation is consistent with the table's collation trait.
> >
> > But it is a thing worth looking at in Drillix (Drill+Phoenix), since the
> > parallel scan and merge is done in Drill. I think Jacques's statement is
> > generally true here, but for some reason I did notice there was a "Sort"
> on
> > top of the Drill+Phoenix rel for a select star without order-by. Anything
> > might be suspicious here?
> >
> >
> > Thanks,
> > Maryann
> >
> > On Thu, Oct 29, 2015 at 4:42 PM, Jacques Nadeau <[email protected]>
> > wrote:
> >
> > > On the first point, in Drill we treat this as distributed and collated
> on
> > > primary key. This doesn't cause problems because exchanges are used to
> > > redistribute data (or get it to the client node). Each exchange will
> > > maintain or not the specific traits.
> > >
> > > On Thu, Oct 29, 2015 at 9:02 AM, Maryann Xue <[email protected]>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I have two questions regarding the Phoenix + Calcite integration:
> > > >
> > > > 1)
> > > > Phoenix has salted tables which add a hashed value "header" to the
> > > > beginning of the rowkey. Thus salted tables are hash partitioned but
> > > > maintains primary key order within each partition.
> > > > So question is how should we describe the collation and distribution
> > > trait
> > > > of salted tables? I assume distribution is just HASH_DISTRIBUTED, but
> > is
> > > > the collation of sorted on PK (just the same as regular tables)
> enough
> > > > here?
> > > >
> > > > 2)
> > > > Phoenix has a implementation of secondary index called local index,
> > which
> > > > means each partition (region) of index table is always co-located
> with
> > > the
> > > > corresponding partition (region) of its parent table.
> > > > Is there a way that we could describe this co-location relationship?
> I
> > > > think it might be useful if we should have a "local join" operator in
> > > > future.
> > > >
> > > >
> > > > Thanks,
> > > > Maryann
> > > >
> > >
> >
>

Re: Question about collation and distribution trait

Reply via email to