Re: Adding/Removing regionservers

Jonathan Gray Tue, 07 Jul 2009 16:30:52 -0700

Yes, you could multi-thread your scanners.

You could query for region information to get the start/stop rows forthe regions in the table, and then spin up a scanner in each thread foreach region.

If you plan on doing anything like that, keep me / the list in the loop,would be willing to help out.


JG

llpind wrote:

Thanks for the link, that sounds good.

If I multi-thread scanners will HBase performance speed up as more boxes are
added?

for example in the above example I had:
for (String typeVal : list){
  Scan tblAScan = new Scan(Bytes.toBytes(typeVal  + “|”),
Bytes.toBytes(typeVal + “|A”)); //give me all IDs for matching TYPE|VALResultScanner s1 = tblA.getScanner(tblAScan);
  for (Result tblBRowResult = s1.next(); tblBRowResult != null;
tblBRowResult = s1.next()){
          Scan tblBScan = new Scan(Bytes.toBytes(tblBRowResult.getValue() ),
Bytes.toBytes(typeVal + “ ”)); //IDs are all numericResultScanner s2 = tblA.getScanner(tblAScan);List results = s2.next().list(); //only care about column datahere, since ID is row keyfor (KeyValue kv : results){//do stuffkv.getValue();}}}
======================================
Modified it with a Get (not updated above).   Thinking the outer loop (get
new scan) could be in a different Thread each time, and then combined the
results in the end?
I'm looking for ways to increase performance by adding boxes.  How can I
spread the scanner load, so it's not waiting for the next iteration?



Jonathan Gray-2 wrote:
Sounds about right.  You seem to have a good grip on things.
0.20 will work with millions of columns in a row, but currently there isno way to return the massive row in segments. If the data is bigenough, you'll have memory allocation issues. Scanners are still asafer way to go until we have intra-row scanning:https://issues.apache.org/jira/browse/HBASE-1537
JG

llpind wrote:
Thanks for the tips.

Yeah that is the model we had before, the problem is we can potentially
have
millions of IDs for a given TYPE|VAL.
we are considering something like:
Row Key: TYPE|VALUE|ID
column: link:TYPE|VALUE

This is only because ID may never have more than a few TYPE|VAL results
in
this current dataset, which would also eliminate the need to go to second
table.Thanks for the help.
Jonathan Gray-2 wrote:
Well you're trying to do a join. How much data is actually in TableB?You might consider denormalizing so that you don't have to query TableB,the data you need is already in TableA.
You could use a Get (single trip) for the inner loop rather than aScanner (which requires multiple round-trips). You could even use a Getfor the outer loop by making your table wide instead of tall.
Row Key:  TYPE|VALUE
Column: link:ID

And you have a column for each ID within that TYPE|VALUE row.

Also, don't forget to close your scanners if you do use scanners.

JG


llpind wrote:
Assume a schema like so:
TableA======================
Row Key:  TYPE|VALUE|ID
Column:  link:ID  (irrelevant)
TableB======================
Row Key: ID
Column: typeval:TYPE|VALUE
===========================



I need to iterate over the TableA using a Scanner to get all IDs based
on
TYPE|VALUE, then for each ID I need to get from TableB what
TYPE|VALUE’s
it’s tied to (a many to many).
Assume I have a list of TYPE|VALUES in a List, and need to process
through
this data.  Done something like this:



for (String typeVal : list){

  Scan tblAScan = new Scan(Bytes.toBytes(typeVal  + “|”),
Bytes.toBytes(typeVal  + “|A”));        //give me all IDs for matching
TYPE|VAL
  ResultScanner s1 = tblA.getScanner(tblAScan);

  for (Result tblBRowResult = s1.next(); tblBRowResult != null;
tblBRowResult = s1.next()){

          Scan tblBScan = new Scan(Bytes.toBytes(tblBRowResult.getValue() ),
Bytes.toBytes(typeVal  + “ ”));  //IDs are all numeric
          ResultScanner s2 = tblA.getScanner(tblAScan);
          List results = s2.next().list();  //only care about column data
here,
since ID is row key

          for (KeyValue kv : results){
                        //do stuff
                        kv.getValue();
          }

  }

}

Re: Adding/Removing regionservers

Reply via email to