[ https://issues.apache.org/jira/browse/PHOENIX-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934551#comment-13934551 ]
Lars Hofhansl commented on PHOENIX-111: --------------------------------------- Talked to James at length off line. For salted tables we need not worry about this, since we'd parallize along salt buckets anyway. For matching a key to a schema, we need to confirm that the region start/end keys are always real keys (except first or last region of when we're salting). I believe that to be case: When a region is split HBase finds the optimal key that would the region in 1/2. So all start/end keys should be real keys that at some point existed in some HFile. > Improve intra-region parallelization > ------------------------------------ > > Key: PHOENIX-111 > URL: https://issues.apache.org/jira/browse/PHOENIX-111 > Project: Phoenix > Issue Type: Bug > Reporter: James Taylor > Assignee: Lars Hofhansl > Attachments: phoenix-111.txt > > > The manner in which Phoenix parallelizes queries is explained in some detail > in the Parallelization section here: > http://phoenix-hbase.blogspot.com/2013/02/phoenix-knobs-dials.html > It's actually not that important to understand all the details. In the case > where we try to parallelize within a region, we rely on the HBase > Bytes.split() method (in DefaultParallelIteratorRegionSplitter) to split, > based on the start and end key of the region. We basically use that method to > come up with the start row and stop row of scans that will all run in > parallel across that region. > The problem is, we haven't really tested this method, and I have my doubts > about it, especially when two keys are of different length. The first thing > that should be done is to write a few simple, independent tests using > Bytes.split() directly to confirm whether or not there's a problem: > 1. Write some simple tests to see if Bytes.split() works as expected. Does it > work for two keys that are of different lengths? If not, we can likely take > two keys and make them the same length through padding b/c we know the > structure of the row key. The better we choose the split points to get even > distribution, the better our parallelization will be. > 2. One case that I know will be problematic is when a table is salted. In > that case, we pre-split the table into N regions, where N is the > SALT_BUCKETS=<N> value. The problem in this case is that the Bytes.split() > points are going to be terrible, because it's not taking into account the > possible values of the row key. For example, imagine you have a table like > this: > {code} > CREATE TABLE foo(k VARCHAR PRIMARY KEY) SALT_BUCKETS=4 > {code} > In this case, we'll pre-split the table and have the following region > boundaries: 0-1, 1-2, 2-3, 3-4 > What will be the Bytes.split() for these region boundaries? It would chunk it > up into even byte boundaries which is not ideal, because the VARCHAR value > would most likely be ascii characters in a range of 'A' to 'z'. We'd be much > better off if we took into account the data types of the row key when we > calculate these split points. > So the second thing to do is make some simple improvements to the start/stop > key we pass Bytes.split() that take into account the data type of each column > that makes up the primary key. > For Phoenix 5.0, we'll collect stats and drive this off of those, but for > now, there's likely a few simple things we could do to make a big improvement. -- This message was sent by Atlassian JIRA (v6.2#6252)