Re: Question about collation and distribution trait

James Taylor Fri, 30 Oct 2015 13:13:35 -0700

See PHOENIX-2207. We can better parallelize the fetching of the data on the
query-node/client-side (without needing to buffer it all) when we don't
care about the order we get it back.


On Fri, Oct 30, 2015 at 12:55 PM, Maryann Xue <[email protected]> wrote:

> Think having another physical TableScan operator, say,
> "PhoenixTableScanUnordered" should be enough to solve this problem. But I'm
> just curious, James, why is there such a big performance difference?
>
> On Fri, Oct 30, 2015 at 12:36 PM, Maryann Xue <[email protected]>
> wrote:
>
>> I see, James. Think there should be a way of doing this in Calcite. I'll
>> figure that out.
>>
>> On Fri, Oct 30, 2015 at 12:15 PM, James Taylor <[email protected]>
>> wrote:
>>
>>> Thanks for the info, Maryann. If we can avoid forcing row key order (when
>>> there's no explicit ordering), we'll get much better performance for non
>>> aggregate queries (7.5x last we measured). The
>>> phoenix.query.force.rowkeyorder
>>> is more for backward compatibility - for users of pre 4.4 releases who
>>> were
>>> depending on row key order even in the absence of an order by clause.
>>>
>>> Thanks,
>>> James
>>>
>>> On Fri, Oct 30, 2015 at 8:15 AM, Maryann Xue <[email protected]>
>>> wrote:
>>>
>>> > Thanks a lot, Jacques, for the answer!
>>> >
>>> > Julian and James, I made a mistake when bringing up this topic
>>> yesterday at
>>> > our sync up meeting. In standalone Phoenix+Calcite, it should not be a
>>> > problem since the parallel scan of HBase regions will be taken care of
>>> by
>>> > Phoenix's ScanPlan, which will do a merge-sort if it sees that the
>>> table is
>>> > salted. The reason why I hit a problem in the tests was that I ignored
>>> that
>>> > the option "phoenix.query.force.rowkeyorder" was set to false by
>>> default.
>>> > We should set it as true in Phoenix+Calcite, to guarantee that our
>>> runtime
>>> > implementation is consistent with the table's collation trait.
>>> >
>>> > But it is a thing worth looking at in Drillix (Drill+Phoenix), since
>>> the
>>> > parallel scan and merge is done in Drill. I think Jacques's statement
>>> is
>>> > generally true here, but for some reason I did notice there was a
>>> "Sort" on
>>> > top of the Drill+Phoenix rel for a select star without order-by.
>>> Anything
>>> > might be suspicious here?
>>> >
>>> >
>>> > Thanks,
>>> > Maryann
>>> >
>>> > On Thu, Oct 29, 2015 at 4:42 PM, Jacques Nadeau <[email protected]>
>>> > wrote:
>>> >
>>> > > On the first point, in Drill we treat this as distributed and
>>> collated on
>>> > > primary key. This doesn't cause problems because exchanges are used
>>> to
>>> > > redistribute data (or get it to the client node). Each exchange will
>>> > > maintain or not the specific traits.
>>> > >
>>> > > On Thu, Oct 29, 2015 at 9:02 AM, Maryann Xue <[email protected]>
>>> > > wrote:
>>> > >
>>> > > > Hi,
>>> > > >
>>> > > > I have two questions regarding the Phoenix + Calcite integration:
>>> > > >
>>> > > > 1)
>>> > > > Phoenix has salted tables which add a hashed value "header" to the
>>> > > > beginning of the rowkey. Thus salted tables are hash partitioned
>>> but
>>> > > > maintains primary key order within each partition.
>>> > > > So question is how should we describe the collation and
>>> distribution
>>> > > trait
>>> > > > of salted tables? I assume distribution is just HASH_DISTRIBUTED,
>>> but
>>> > is
>>> > > > the collation of sorted on PK (just the same as regular tables)
>>> enough
>>> > > > here?
>>> > > >
>>> > > > 2)
>>> > > > Phoenix has a implementation of secondary index called local index,
>>> > which
>>> > > > means each partition (region) of index table is always co-located
>>> with
>>> > > the
>>> > > > corresponding partition (region) of its parent table.
>>> > > > Is there a way that we could describe this co-location
>>> relationship? I
>>> > > > think it might be useful if we should have a "local join" operator
>>> in
>>> > > > future.
>>> > > >
>>> > > >
>>> > > > Thanks,
>>> > > > Maryann
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: Question about collation and distribution trait

Reply via email to