Hi Vineet,
One query to pull 5 millions data will take a time which is not
recommended way to leverage Kylin.
In our internal performance testing, Kylin could handle hundreds QPS
for small queries on single machine with several tomcat instances, please
refer to this slides (P31) for more detail:
http://www.slideshare.net/lukehan/apache-kylin-big-data-technology-conference-2014-beijing-v2
Kylin is not a database which can only serve well for certain cases,
please evaluate your requirements, case, data, it's appreciated if you
could share more detail about your case, then we could have more clear idea
to help you:)
BTW, "Urgent Call!" is your signature or really urgent? I saw it in
every your thread and wondering about it:-)
Thank you very much
Luke
Best Regards!
---------------------
Luke Han
On Fri, Jun 19, 2015 at 7:51 AM, Adunuthula, Seshu <[email protected]>
wrote:
> Sizing & Tuning Hbase requires some skills, but there is a lot of help
> available on the web. Here are some basic principles to begin with.
>
> 1. Do not colocate Hbase Region Servers and MapReduce on the same nodes.
> Shut down the Node Managers on the nodes running the Region Servers. It
> reduces your MR Capacity but makes your Hbase a lot more stable.
> 2. Size your Region Servers correctly. Here is a great blog by Lars on
> this subject.
> https://www.quora.com/HBase-Region-Server-guidelines-give-a-size-range-of-a
> bout-1TB-whereas-data-nodes-are-configured-20-times-bigger-Why
>
> Regards
> Seshu Adunuthula
>
>
> On 6/19/15, 3:12 AM, "Li Yang" <[email protected]> wrote:
>
> >In the end, HBase is the bottleneck of the number parallel queries.
> >Because
> >every query will translated into one or more HBase scan. Assuming not much
> >online processing is required (data is pre-aggregated right), the HBase
> >scan will be the bottleneck.
> >
> >On Thu, Jun 11, 2015 at 5:34 PM, Shi, Shaofeng <[email protected]> wrote:
> >
> >> Recommend for reading:
> >>
> >> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
> >>
> >>
> >> On 6/11/15, 4:28 PM, "Vineet Mishra" <[email protected]> wrote:
> >>
> >> >Hi,
> >> >
> >> >I was trying Kylin for some of my usecase, where the data cube size is
> >> >110Mb with 5 Million Records, the query for full data takes around a
> >> >minute
> >> >or so which seems to be taking hell lot of time, even apart from this I
> >> >was
> >> >wondering as what is the query threshold that Kylin can handle in
> >> >parallel.
> >> >
> >> >For instance, how many queries can be fired in parallel to our
> >>aggregated
> >> >data cubes and is there some practice which can gain the query
> >> >performance.
> >> >
> >> >Urgent Call!
> >> >
> >> >Thanks!
> >>
> >>
>
>