Re: Adding/Removing regionservers

llpind Tue, 07 Jul 2009 15:53:54 -0700

Thanks for the link, that sounds good.

If I multi-thread scanners will HBase performance speed up as more boxes are
added?


for example in the above example I had:

for (String typeVal : list){ 

  Scan tblAScan = new Scan(Bytes.toBytes(typeVal  + “|”),
Bytes.toBytes(typeVal  + “|A”)); //give me all IDs for matching TYPE|VAL 
  ResultScanner s1 = tblA.getScanner(tblAScan); 

  for (Result tblBRowResult = s1.next(); tblBRowResult != null;
tblBRowResult = s1.next()){ 

          Scan tblBScan = new Scan(Bytes.toBytes(tblBRowResult.getValue() ),
Bytes.toBytes(typeVal  + “ ”));  //IDs are all numeric 
          ResultScanner s2 = tblA.getScanner(tblAScan); 
          List results = s2.next().list();  //only care about column data
here, since ID is row key 

          for (KeyValue kv : results){ 
                        //do stuff 
                        kv.getValue(); 
          } 

  } 

} 


======================================
Modified it with a Get (not updated above).   Thinking the outer loop (get
new scan) could be in a different Thread each time, and then combined the
results in the end?  

I'm looking for ways to increase performance by adding boxes.  How can I
spread the scanner load, so it's not waiting for the next iteration?



Jonathan Gray-2 wrote:
> 
> Sounds about right.  You seem to have a good grip on things.
> 
> 0.20 will work with millions of columns in a row, but currently there is 
> no way to return the massive row in segments.  If the data is big 
> enough, you'll have memory allocation issues.  Scanners are still a 
> safer way to go until we have intra-row scanning: 
> https://issues.apache.org/jira/browse/HBASE-1537
> 
> JG
> 
> llpind wrote:
>> Thanks for the tips.
>> 
>> Yeah that is the model we had before, the problem is we can potentially
>> have
>> millions of IDs for a given TYPE|VAL. 
>> 
>> we are considering something like:
>> Row Key: TYPE|VALUE|ID
>> column: link:TYPE|VALUE
>> 
>> This is only because ID may never have more than a few TYPE|VAL results
>> in
>> this current dataset, which would also eliminate the need to go to second
>> table.  
>> 
>> Thanks for the help.  
>> 
>> 
>> Jonathan Gray-2 wrote:
>>> Well you're trying to do a join.  How much data is actually in TableB? 
>>> You might consider denormalizing so that you don't have to query TableB, 
>>> the data you need is already in TableA.
>>>
>>> You could use a Get (single trip) for the inner loop rather than a 
>>> Scanner (which requires multiple round-trips).  You could even use a Get 
>>> for the outer loop by making your table wide instead of tall.
>>>
>>> Row Key:  TYPE|VALUE
>>> Column: link:ID
>>>
>>> And you have a column for each ID within that TYPE|VALUE row.
>>>
>>> Also, don't forget to close your scanners if you do use scanners.
>>>
>>> JG
>>>
>>>
>>> llpind wrote:
>>>> Assume a schema like so:  
>>>>
>>>> TableA======================
>>>> Row Key:  TYPE|VALUE|ID
>>>> Column:  link:ID  (irrelevant)
>>>> TableB======================
>>>> Row Key: ID
>>>> Column: typeval:TYPE|VALUE
>>>> ===========================
>>>>
>>>>
>>>>
>>>> I need to iterate over the TableA using a Scanner to get all IDs based
>>>> on
>>>> TYPE|VALUE, then for each ID I need to get from TableB what
>>>> TYPE|VALUE’s
>>>> it’s tied to (a many to many).
>>>> Assume I have a list of TYPE|VALUES in a List, and need to process
>>>> through
>>>> this data.  Done something like this:
>>>>
>>>>
>>>>
>>>> for (String typeVal : list){
>>>>
>>>>   Scan tblAScan = new Scan(Bytes.toBytes(typeVal  + “|”),
>>>> Bytes.toBytes(typeVal  + “|A”));   //give me all IDs for matching
>>>> TYPE|VAL
>>>>   ResultScanner s1 = tblA.getScanner(tblAScan);
>>>>
>>>>   for (Result tblBRowResult = s1.next(); tblBRowResult != null;
>>>> tblBRowResult = s1.next()){
>>>>
>>>>      Scan tblBScan = new Scan(Bytes.toBytes(tblBRowResult.getValue() ),
>>>> Bytes.toBytes(typeVal  + “ ”));  //IDs are all numeric
>>>>      ResultScanner s2 = tblA.getScanner(tblAScan);
>>>>      List results = s2.next().list();  //only care about column data
>>>> here,
>>>> since ID is row key
>>>>
>>>>      for (KeyValue kv : results){
>>>>                    //do stuff
>>>>                    kv.getValue();
>>>>      }
>>>>
>>>>   }
>>>>
>>>> }
>>>>
>>>
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Adding-Removing-regionservers-tp24309642p24382764.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Adding/Removing regionservers

Reply via email to