Yes, you could multi-thread your scanners.

You could query for region information to get the start/stop rows for the regions in the table, and then spin up a scanner in each thread for each region.

If you plan on doing anything like that, keep me / the list in the loop, would be willing to help out.

JG

llpind wrote:
Thanks for the link, that sounds good.

If I multi-thread scanners will HBase performance speed up as more boxes are
added?

for example in the above example I had:

for (String typeVal : list){
  Scan tblAScan = new Scan(Bytes.toBytes(typeVal  + “|”),
Bytes.toBytes(typeVal + “|A”)); //give me all IDs for matching TYPE|VAL ResultScanner s1 = tblA.getScanner(tblAScan);
  for (Result tblBRowResult = s1.next(); tblBRowResult != null;
tblBRowResult = s1.next()){
          Scan tblBScan = new Scan(Bytes.toBytes(tblBRowResult.getValue() ),
Bytes.toBytes(typeVal + “ ”)); //IDs are all numeric ResultScanner s2 = tblA.getScanner(tblAScan); List results = s2.next().list(); //only care about column data here, since ID is row key for (KeyValue kv : results){ //do stuff kv.getValue(); } } }

======================================
Modified it with a Get (not updated above).   Thinking the outer loop (get
new scan) could be in a different Thread each time, and then combined the
results in the end?
I'm looking for ways to increase performance by adding boxes.  How can I
spread the scanner load, so it's not waiting for the next iteration?



Jonathan Gray-2 wrote:
Sounds about right.  You seem to have a good grip on things.

0.20 will work with millions of columns in a row, but currently there is no way to return the massive row in segments. If the data is big enough, you'll have memory allocation issues. Scanners are still a safer way to go until we have intra-row scanning: https://issues.apache.org/jira/browse/HBASE-1537

JG

llpind wrote:
Thanks for the tips.

Yeah that is the model we had before, the problem is we can potentially
have
millions of IDs for a given TYPE|VAL.
we are considering something like:
Row Key: TYPE|VALUE|ID
column: link:TYPE|VALUE

This is only because ID may never have more than a few TYPE|VAL results
in
this current dataset, which would also eliminate the need to go to second
table. Thanks for the help.

Jonathan Gray-2 wrote:
Well you're trying to do a join. How much data is actually in TableB? You might consider denormalizing so that you don't have to query TableB, the data you need is already in TableA.

You could use a Get (single trip) for the inner loop rather than a Scanner (which requires multiple round-trips). You could even use a Get for the outer loop by making your table wide instead of tall.

Row Key:  TYPE|VALUE
Column: link:ID

And you have a column for each ID within that TYPE|VALUE row.

Also, don't forget to close your scanners if you do use scanners.

JG


llpind wrote:
Assume a schema like so:
TableA======================
Row Key:  TYPE|VALUE|ID
Column:  link:ID  (irrelevant)
TableB======================
Row Key: ID
Column: typeval:TYPE|VALUE
===========================



I need to iterate over the TableA using a Scanner to get all IDs based
on
TYPE|VALUE, then for each ID I need to get from TableB what
TYPE|VALUE’s
it’s tied to (a many to many).
Assume I have a list of TYPE|VALUES in a List, and need to process
through
this data.  Done something like this:



for (String typeVal : list){

  Scan tblAScan = new Scan(Bytes.toBytes(typeVal  + “|”),
Bytes.toBytes(typeVal  + “|A”));        //give me all IDs for matching
TYPE|VAL
  ResultScanner s1 = tblA.getScanner(tblAScan);

  for (Result tblBRowResult = s1.next(); tblBRowResult != null;
tblBRowResult = s1.next()){

          Scan tblBScan = new Scan(Bytes.toBytes(tblBRowResult.getValue() ),
Bytes.toBytes(typeVal  + “ ”));  //IDs are all numeric
          ResultScanner s2 = tblA.getScanner(tblAScan);
          List results = s2.next().list();  //only care about column data
here,
since ID is row key

          for (KeyValue kv : results){
                        //do stuff
                        kv.getValue();
          }

  }

}



Reply via email to