Sizing & Tuning Hbase requires some skills, but there is a lot of help
available on the web. Here are some basic principles to begin with.

1. Do not colocate Hbase Region Servers and MapReduce on the same nodes.
Shut down the Node Managers on the nodes running the Region Servers. It
reduces your MR Capacity but makes your Hbase a lot more stable.
2. Size your Region Servers correctly. Here is a great blog by Lars on
this subject. 
https://www.quora.com/HBase-Region-Server-guidelines-give-a-size-range-of-a
bout-1TB-whereas-data-nodes-are-configured-20-times-bigger-Why

Regards
Seshu Adunuthula


On 6/19/15, 3:12 AM, "Li Yang" <[email protected]> wrote:

>In the end, HBase is the bottleneck of the number parallel queries.
>Because
>every query will translated into one or more HBase scan. Assuming not much
>online processing is required (data is pre-aggregated right), the HBase
>scan will be the bottleneck.
>
>On Thu, Jun 11, 2015 at 5:34 PM, Shi, Shaofeng <[email protected]> wrote:
>
>> Recommend for reading:
>>
>> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
>>
>>
>> On 6/11/15, 4:28 PM, "Vineet Mishra" <[email protected]> wrote:
>>
>> >Hi,
>> >
>> >I was trying Kylin for some of my usecase, where the data cube size is
>> >110Mb with 5 Million Records, the query for full data takes around a
>> >minute
>> >or so which seems to be taking hell lot of time, even apart from this I
>> >was
>> >wondering as what is the query threshold that Kylin can handle in
>> >parallel.
>> >
>> >For instance, how many queries can be fired in parallel to our
>>aggregated
>> >data cubes and is there some practice which can gain the query
>> >performance.
>> >
>> >Urgent Call!
>> >
>> >Thanks!
>>
>>

Reply via email to