Think having another physical TableScan operator, say, "PhoenixTableScanUnordered" should be enough to solve this problem. But I'm just curious, James, why is there such a big performance difference?
On Fri, Oct 30, 2015 at 12:36 PM, Maryann Xue <[email protected]> wrote: > I see, James. Think there should be a way of doing this in Calcite. I'll > figure that out. > > On Fri, Oct 30, 2015 at 12:15 PM, James Taylor <[email protected]> > wrote: > >> Thanks for the info, Maryann. If we can avoid forcing row key order (when >> there's no explicit ordering), we'll get much better performance for non >> aggregate queries (7.5x last we measured). The >> phoenix.query.force.rowkeyorder >> is more for backward compatibility - for users of pre 4.4 releases who >> were >> depending on row key order even in the absence of an order by clause. >> >> Thanks, >> James >> >> On Fri, Oct 30, 2015 at 8:15 AM, Maryann Xue <[email protected]> >> wrote: >> >> > Thanks a lot, Jacques, for the answer! >> > >> > Julian and James, I made a mistake when bringing up this topic >> yesterday at >> > our sync up meeting. In standalone Phoenix+Calcite, it should not be a >> > problem since the parallel scan of HBase regions will be taken care of >> by >> > Phoenix's ScanPlan, which will do a merge-sort if it sees that the >> table is >> > salted. The reason why I hit a problem in the tests was that I ignored >> that >> > the option "phoenix.query.force.rowkeyorder" was set to false by >> default. >> > We should set it as true in Phoenix+Calcite, to guarantee that our >> runtime >> > implementation is consistent with the table's collation trait. >> > >> > But it is a thing worth looking at in Drillix (Drill+Phoenix), since the >> > parallel scan and merge is done in Drill. I think Jacques's statement is >> > generally true here, but for some reason I did notice there was a >> "Sort" on >> > top of the Drill+Phoenix rel for a select star without order-by. >> Anything >> > might be suspicious here? >> > >> > >> > Thanks, >> > Maryann >> > >> > On Thu, Oct 29, 2015 at 4:42 PM, Jacques Nadeau <[email protected]> >> > wrote: >> > >> > > On the first point, in Drill we treat this as distributed and >> collated on >> > > primary key. This doesn't cause problems because exchanges are used to >> > > redistribute data (or get it to the client node). Each exchange will >> > > maintain or not the specific traits. >> > > >> > > On Thu, Oct 29, 2015 at 9:02 AM, Maryann Xue <[email protected]> >> > > wrote: >> > > >> > > > Hi, >> > > > >> > > > I have two questions regarding the Phoenix + Calcite integration: >> > > > >> > > > 1) >> > > > Phoenix has salted tables which add a hashed value "header" to the >> > > > beginning of the rowkey. Thus salted tables are hash partitioned but >> > > > maintains primary key order within each partition. >> > > > So question is how should we describe the collation and distribution >> > > trait >> > > > of salted tables? I assume distribution is just HASH_DISTRIBUTED, >> but >> > is >> > > > the collation of sorted on PK (just the same as regular tables) >> enough >> > > > here? >> > > > >> > > > 2) >> > > > Phoenix has a implementation of secondary index called local index, >> > which >> > > > means each partition (region) of index table is always co-located >> with >> > > the >> > > > corresponding partition (region) of its parent table. >> > > > Is there a way that we could describe this co-location >> relationship? I >> > > > think it might be useful if we should have a "local join" operator >> in >> > > > future. >> > > > >> > > > >> > > > Thanks, >> > > > Maryann >> > > > >> > > >> > >> > >
