Okay will do. I'm new to scanners & regions still.
HTable has a getRegionInfo() method which returns Map<HRegionInfo,
HServerAddress> . I can iterate over this and spawn scanners per region
given a start/stop. I'm a bit confused how I put my start/stop rows in
then? In my loops I have a start/stop row as well. Basically how will I
combine the result from all the Threads with my row filters & region
start/stop row keys.
Could you please explain how to go about this?
Thanks.
Jonathan Gray-2 wrote:
>
> Yes, you could multi-thread your scanners.
>
> You could query for region information to get the start/stop rows for
> the regions in the table, and then spin up a scanner in each thread for
> each region.
>
> If you plan on doing anything like that, keep me / the list in the loop,
> would be willing to help out.
>
> JG
>
> llpind wrote:
>> Thanks for the link, that sounds good.
>>
>> If I multi-thread scanners will HBase performance speed up as more boxes
>> are
>> added?
>>
>> for example in the above example I had:
>>
>> for (String typeVal : list){
>>
>> Scan tblAScan = new Scan(Bytes.toBytes(typeVal + “|”),
>> Bytes.toBytes(typeVal + “|A”)); //give me all IDs for matching TYPE|VAL
>> ResultScanner s1 = tblA.getScanner(tblAScan);
>>
>> for (Result tblBRowResult = s1.next(); tblBRowResult != null;
>> tblBRowResult = s1.next()){
>>
>> Scan tblBScan = new Scan(Bytes.toBytes(tblBRowResult.getValue()
>> ),
>> Bytes.toBytes(typeVal + “ ”)); //IDs are all numeric
>> ResultScanner s2 = tblA.getScanner(tblAScan);
>> List results = s2.next().list(); //only care about column data
>> here, since ID is row key
>>
>> for (KeyValue kv : results){
>> //do stuff
>> kv.getValue();
>> }
>>
>> }
>>
>> }
>>
>>
>> ======================================
>> Modified it with a Get (not updated above). Thinking the outer loop
>> (get
>> new scan) could be in a different Thread each time, and then combined the
>> results in the end?
>>
>> I'm looking for ways to increase performance by adding boxes. How can I
>> spread the scanner load, so it's not waiting for the next iteration?
>>
>>
>>
>> Jonathan Gray-2 wrote:
>>> Sounds about right. You seem to have a good grip on things.
>>>
>>> 0.20 will work with millions of columns in a row, but currently there is
>>> no way to return the massive row in segments. If the data is big
>>> enough, you'll have memory allocation issues. Scanners are still a
>>> safer way to go until we have intra-row scanning:
>>> https://issues.apache.org/jira/browse/HBASE-1537
>>>
>>> JG
>>>
>>> llpind wrote:
>>>> Thanks for the tips.
>>>>
>>>> Yeah that is the model we had before, the problem is we can potentially
>>>> have
>>>> millions of IDs for a given TYPE|VAL.
>>>>
>>>> we are considering something like:
>>>> Row Key: TYPE|VALUE|ID
>>>> column: link:TYPE|VALUE
>>>>
>>>> This is only because ID may never have more than a few TYPE|VAL results
>>>> in
>>>> this current dataset, which would also eliminate the need to go to
>>>> second
>>>> table.
>>>>
>>>> Thanks for the help.
>>>>
>>>>
>>>> Jonathan Gray-2 wrote:
>>>>> Well you're trying to do a join. How much data is actually in TableB?
>>>>> You might consider denormalizing so that you don't have to query
>>>>> TableB,
>>>>> the data you need is already in TableA.
>>>>>
>>>>> You could use a Get (single trip) for the inner loop rather than a
>>>>> Scanner (which requires multiple round-trips). You could even use a
>>>>> Get
>>>>> for the outer loop by making your table wide instead of tall.
>>>>>
>>>>> Row Key: TYPE|VALUE
>>>>> Column: link:ID
>>>>>
>>>>> And you have a column for each ID within that TYPE|VALUE row.
>>>>>
>>>>> Also, don't forget to close your scanners if you do use scanners.
>>>>>
>>>>> JG
>>>>>
>>>>>
>>>>> llpind wrote:
>>>>>> Assume a schema like so:
>>>>>>
>>>>>> TableA======================
>>>>>> Row Key: TYPE|VALUE|ID
>>>>>> Column: link:ID (irrelevant)
>>>>>> TableB======================
>>>>>> Row Key: ID
>>>>>> Column: typeval:TYPE|VALUE
>>>>>> ===========================
>>>>>>
>>>>>>
>>>>>>
>>>>>> I need to iterate over the TableA using a Scanner to get all IDs
>>>>>> based
>>>>>> on
>>>>>> TYPE|VALUE, then for each ID I need to get from TableB what
>>>>>> TYPE|VALUE’s
>>>>>> it’s tied to (a many to many).
>>>>>> Assume I have a list of TYPE|VALUES in a List, and need to process
>>>>>> through
>>>>>> this data. Done something like this:
>>>>>>
>>>>>>
>>>>>>
>>>>>> for (String typeVal : list){
>>>>>>
>>>>>> Scan tblAScan = new Scan(Bytes.toBytes(typeVal + “|”),
>>>>>> Bytes.toBytes(typeVal + “|A”)); //give me all IDs for matching
>>>>>> TYPE|VAL
>>>>>> ResultScanner s1 = tblA.getScanner(tblAScan);
>>>>>>
>>>>>> for (Result tblBRowResult = s1.next(); tblBRowResult != null;
>>>>>> tblBRowResult = s1.next()){
>>>>>>
>>>>>> Scan tblBScan = new Scan(Bytes.toBytes(tblBRowResult.getValue() ),
>>>>>> Bytes.toBytes(typeVal + “ ”)); //IDs are all numeric
>>>>>> ResultScanner s2 = tblA.getScanner(tblAScan);
>>>>>> List results = s2.next().list(); //only care about column data
>>>>>> here,
>>>>>> since ID is row key
>>>>>>
>>>>>> for (KeyValue kv : results){
>>>>>> //do stuff
>>>>>> kv.getValue();
>>>>>> }
>>>>>>
>>>>>> }
>>>>>>
>>>>>> }
>>>>>>
>>>
>>
>
>
--
View this message in context:
http://www.nabble.com/Adding-Removing-regionservers-tp24309642p24395309.html
Sent from the HBase User mailing list archive at Nabble.com.