Thanks a lot, Jacques, for the answer! Julian and James, I made a mistake when bringing up this topic yesterday at our sync up meeting. In standalone Phoenix+Calcite, it should not be a problem since the parallel scan of HBase regions will be taken care of by Phoenix's ScanPlan, which will do a merge-sort if it sees that the table is salted. The reason why I hit a problem in the tests was that I ignored that the option "phoenix.query.force.rowkeyorder" was set to false by default. We should set it as true in Phoenix+Calcite, to guarantee that our runtime implementation is consistent with the table's collation trait.
But it is a thing worth looking at in Drillix (Drill+Phoenix), since the parallel scan and merge is done in Drill. I think Jacques's statement is generally true here, but for some reason I did notice there was a "Sort" on top of the Drill+Phoenix rel for a select star without order-by. Anything might be suspicious here? Thanks, Maryann On Thu, Oct 29, 2015 at 4:42 PM, Jacques Nadeau <[email protected]> wrote: > On the first point, in Drill we treat this as distributed and collated on > primary key. This doesn't cause problems because exchanges are used to > redistribute data (or get it to the client node). Each exchange will > maintain or not the specific traits. > > On Thu, Oct 29, 2015 at 9:02 AM, Maryann Xue <[email protected]> > wrote: > > > Hi, > > > > I have two questions regarding the Phoenix + Calcite integration: > > > > 1) > > Phoenix has salted tables which add a hashed value "header" to the > > beginning of the rowkey. Thus salted tables are hash partitioned but > > maintains primary key order within each partition. > > So question is how should we describe the collation and distribution > trait > > of salted tables? I assume distribution is just HASH_DISTRIBUTED, but is > > the collation of sorted on PK (just the same as regular tables) enough > > here? > > > > 2) > > Phoenix has a implementation of secondary index called local index, which > > means each partition (region) of index table is always co-located with > the > > corresponding partition (region) of its parent table. > > Is there a way that we could describe this co-location relationship? I > > think it might be useful if we should have a "local join" operator in > > future. > > > > > > Thanks, > > Maryann > > >
