See PHOENIX-2207. We can better parallelize the fetching of the data on the query-node/client-side (without needing to buffer it all) when we don't care about the order we get it back.
On Fri, Oct 30, 2015 at 12:55 PM, Maryann Xue <[email protected]> wrote: > Think having another physical TableScan operator, say, > "PhoenixTableScanUnordered" should be enough to solve this problem. But I'm > just curious, James, why is there such a big performance difference? > > On Fri, Oct 30, 2015 at 12:36 PM, Maryann Xue <[email protected]> > wrote: > >> I see, James. Think there should be a way of doing this in Calcite. I'll >> figure that out. >> >> On Fri, Oct 30, 2015 at 12:15 PM, James Taylor <[email protected]> >> wrote: >> >>> Thanks for the info, Maryann. If we can avoid forcing row key order (when >>> there's no explicit ordering), we'll get much better performance for non >>> aggregate queries (7.5x last we measured). The >>> phoenix.query.force.rowkeyorder >>> is more for backward compatibility - for users of pre 4.4 releases who >>> were >>> depending on row key order even in the absence of an order by clause. >>> >>> Thanks, >>> James >>> >>> On Fri, Oct 30, 2015 at 8:15 AM, Maryann Xue <[email protected]> >>> wrote: >>> >>> > Thanks a lot, Jacques, for the answer! >>> > >>> > Julian and James, I made a mistake when bringing up this topic >>> yesterday at >>> > our sync up meeting. In standalone Phoenix+Calcite, it should not be a >>> > problem since the parallel scan of HBase regions will be taken care of >>> by >>> > Phoenix's ScanPlan, which will do a merge-sort if it sees that the >>> table is >>> > salted. The reason why I hit a problem in the tests was that I ignored >>> that >>> > the option "phoenix.query.force.rowkeyorder" was set to false by >>> default. >>> > We should set it as true in Phoenix+Calcite, to guarantee that our >>> runtime >>> > implementation is consistent with the table's collation trait. >>> > >>> > But it is a thing worth looking at in Drillix (Drill+Phoenix), since >>> the >>> > parallel scan and merge is done in Drill. I think Jacques's statement >>> is >>> > generally true here, but for some reason I did notice there was a >>> "Sort" on >>> > top of the Drill+Phoenix rel for a select star without order-by. >>> Anything >>> > might be suspicious here? >>> > >>> > >>> > Thanks, >>> > Maryann >>> > >>> > On Thu, Oct 29, 2015 at 4:42 PM, Jacques Nadeau <[email protected]> >>> > wrote: >>> > >>> > > On the first point, in Drill we treat this as distributed and >>> collated on >>> > > primary key. This doesn't cause problems because exchanges are used >>> to >>> > > redistribute data (or get it to the client node). Each exchange will >>> > > maintain or not the specific traits. >>> > > >>> > > On Thu, Oct 29, 2015 at 9:02 AM, Maryann Xue <[email protected]> >>> > > wrote: >>> > > >>> > > > Hi, >>> > > > >>> > > > I have two questions regarding the Phoenix + Calcite integration: >>> > > > >>> > > > 1) >>> > > > Phoenix has salted tables which add a hashed value "header" to the >>> > > > beginning of the rowkey. Thus salted tables are hash partitioned >>> but >>> > > > maintains primary key order within each partition. >>> > > > So question is how should we describe the collation and >>> distribution >>> > > trait >>> > > > of salted tables? I assume distribution is just HASH_DISTRIBUTED, >>> but >>> > is >>> > > > the collation of sorted on PK (just the same as regular tables) >>> enough >>> > > > here? >>> > > > >>> > > > 2) >>> > > > Phoenix has a implementation of secondary index called local index, >>> > which >>> > > > means each partition (region) of index table is always co-located >>> with >>> > > the >>> > > > corresponding partition (region) of its parent table. >>> > > > Is there a way that we could describe this co-location >>> relationship? I >>> > > > think it might be useful if we should have a "local join" operator >>> in >>> > > > future. >>> > > > >>> > > > >>> > > > Thanks, >>> > > > Maryann >>> > > > >>> > > >>> > >>> >> >> >
