Re: Question about collation and distribution trait

Maryann Xue Fri, 30 Oct 2015 12:56:46 -0700

Think having another physical TableScan operator, say,
"PhoenixTableScanUnordered" should be enough to solve this problem. But I'm
just curious, James, why is there such a big performance difference?


On Fri, Oct 30, 2015 at 12:36 PM, Maryann Xue <[email protected]> wrote:

> I see, James. Think there should be a way of doing this in Calcite. I'll
> figure that out.
>
> On Fri, Oct 30, 2015 at 12:15 PM, James Taylor <[email protected]>
> wrote:
>
>> Thanks for the info, Maryann. If we can avoid forcing row key order (when
>> there's no explicit ordering), we'll get much better performance for non
>> aggregate queries (7.5x last we measured). The
>> phoenix.query.force.rowkeyorder
>> is more for backward compatibility - for users of pre 4.4 releases who
>> were
>> depending on row key order even in the absence of an order by clause.
>>
>> Thanks,
>> James
>>
>> On Fri, Oct 30, 2015 at 8:15 AM, Maryann Xue <[email protected]>
>> wrote:
>>
>> > Thanks a lot, Jacques, for the answer!
>> >
>> > Julian and James, I made a mistake when bringing up this topic
>> yesterday at
>> > our sync up meeting. In standalone Phoenix+Calcite, it should not be a
>> > problem since the parallel scan of HBase regions will be taken care of
>> by
>> > Phoenix's ScanPlan, which will do a merge-sort if it sees that the
>> table is
>> > salted. The reason why I hit a problem in the tests was that I ignored
>> that
>> > the option "phoenix.query.force.rowkeyorder" was set to false by
>> default.
>> > We should set it as true in Phoenix+Calcite, to guarantee that our
>> runtime
>> > implementation is consistent with the table's collation trait.
>> >
>> > But it is a thing worth looking at in Drillix (Drill+Phoenix), since the
>> > parallel scan and merge is done in Drill. I think Jacques's statement is
>> > generally true here, but for some reason I did notice there was a
>> "Sort" on
>> > top of the Drill+Phoenix rel for a select star without order-by.
>> Anything
>> > might be suspicious here?
>> >
>> >
>> > Thanks,
>> > Maryann
>> >
>> > On Thu, Oct 29, 2015 at 4:42 PM, Jacques Nadeau <[email protected]>
>> > wrote:
>> >
>> > > On the first point, in Drill we treat this as distributed and
>> collated on
>> > > primary key. This doesn't cause problems because exchanges are used to
>> > > redistribute data (or get it to the client node). Each exchange will
>> > > maintain or not the specific traits.
>> > >
>> > > On Thu, Oct 29, 2015 at 9:02 AM, Maryann Xue <[email protected]>
>> > > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I have two questions regarding the Phoenix + Calcite integration:
>> > > >
>> > > > 1)
>> > > > Phoenix has salted tables which add a hashed value "header" to the
>> > > > beginning of the rowkey. Thus salted tables are hash partitioned but
>> > > > maintains primary key order within each partition.
>> > > > So question is how should we describe the collation and distribution
>> > > trait
>> > > > of salted tables? I assume distribution is just HASH_DISTRIBUTED,
>> but
>> > is
>> > > > the collation of sorted on PK (just the same as regular tables)
>> enough
>> > > > here?
>> > > >
>> > > > 2)
>> > > > Phoenix has a implementation of secondary index called local index,
>> > which
>> > > > means each partition (region) of index table is always co-located
>> with
>> > > the
>> > > > corresponding partition (region) of its parent table.
>> > > > Is there a way that we could describe this co-location
>> relationship? I
>> > > > think it might be useful if we should have a "local join" operator
>> in
>> > > > future.
>> > > >
>> > > >
>> > > > Thanks,
>> > > > Maryann
>> > > >
>> > >
>> >
>>
>
>

Re: Question about collation and distribution trait

Reply via email to