Re: Kylin Query Latency and Number of Parallel Queries

Vineet Mishra Fri, 19 Jun 2015 23:51:39 -0700

Thanks Luke! :)

On Sat, Jun 20, 2015 at 11:44 AM, Luke Han <[email protected]> wrote:


> Hi Vineet,
>     I got it, please feel free to continue post your question here. We are
> happy to help, but frankly talk, we can't grantee the response time since
> we also have tasks inside. But we will try our best to help everyone to use
> Kylin smoothly.
>     For your case, the concurrency should not be an issue, if you can
> control the queries coming from Tableau, that mean do not allow Tableau
> dashboard/report to pull huge data in one query. For example, please use
> "connect live" not "import data" in Tableau.
>     And, please setup more nodes to serve high concurrency requests,
> Kylin's REST server is stateless which could scale out very well.
>
>     Any issue, please let's know.
>     Thanks.
>
>
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Fri, Jun 19, 2015 at 5:48 PM, Vineet Mishra <[email protected]>
> wrote:
>
> > Thanks Luke for the prompt response.
> >
> > As the Kylin project being in incubation mode with comparatively little
> > less active mailers and due to the demand of my project which has already
> > crossed the expected deliverable timeline, I have to put it that way! :)
> >
> > Well my use case is to get the aggregated data across various dimensions
> to
> > visualize it on tableau. The visualization will be accessed by 100 of
> users
> > (even more) and the connection will be live, as a result multiple queries
> > are expected.
> >
> > On Fri, Jun 19, 2015 at 11:57 PM, Luke Han <[email protected]> wrote:
> >
> > > Hi Vineet,
> > >     One query to pull 5 millions data will take a time which is not
> > > recommended way to leverage Kylin.
> > >     In our internal performance testing, Kylin could handle hundreds
> QPS
> > > for small queries on single machine with several tomcat instances,
> please
> > > refer to this slides (P31) for more detail:
> > >
> > >
> >
> http://www.slideshare.net/lukehan/apache-kylin-big-data-technology-conference-2014-beijing-v2
> > >
> > >     Kylin is not a database which can only serve well for certain
> cases,
> > > please evaluate your requirements, case, data, it's appreciated if you
> > > could share more detail about your case, then we could have more clear
> > idea
> > > to help you:)
> > >
> > >     BTW, "Urgent Call!" is your signature or really urgent? I saw it in
> > > every your thread and wondering about it:-)
> > >
> > >     Thank you very much
> > >
> > > Luke
> > >
> > >
> > >
> > > Best Regards!
> > > ---------------------
> > >
> > > Luke Han
> > >
> > > On Fri, Jun 19, 2015 at 7:51 AM, Adunuthula, Seshu <
> [email protected]
> > >
> > > wrote:
> > >
> > > > Sizing & Tuning Hbase requires some skills, but there is a lot of
> help
> > > > available on the web. Here are some basic principles to begin with.
> > > >
> > > > 1. Do not colocate Hbase Region Servers and MapReduce on the same
> > nodes.
> > > > Shut down the Node Managers on the nodes running the Region Servers.
> It
> > > > reduces your MR Capacity but makes your Hbase a lot more stable.
> > > > 2. Size your Region Servers correctly. Here is a great blog by Lars
> on
> > > > this subject.
> > > >
> > >
> >
> https://www.quora.com/HBase-Region-Server-guidelines-give-a-size-range-of-a
> > > > bout-1TB-whereas-data-nodes-are-configured-20-times-bigger-Why
> > > >
> > > > Regards
> > > > Seshu Adunuthula
> > > >
> > > >
> > > > On 6/19/15, 3:12 AM, "Li Yang" <[email protected]> wrote:
> > > >
> > > > >In the end, HBase is the bottleneck of the number parallel queries.
> > > > >Because
> > > > >every query will translated into one or more HBase scan. Assuming
> not
> > > much
> > > > >online processing is required (data is pre-aggregated right), the
> > HBase
> > > > >scan will be the bottleneck.
> > > > >
> > > > >On Thu, Jun 11, 2015 at 5:34 PM, Shi, Shaofeng <[email protected]>
> > > wrote:
> > > > >
> > > > >> Recommend for reading:
> > > > >>
> > > > >> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
> > > > >>
> > > > >>
> > > > >> On 6/11/15, 4:28 PM, "Vineet Mishra" <[email protected]>
> > wrote:
> > > > >>
> > > > >> >Hi,
> > > > >> >
> > > > >> >I was trying Kylin for some of my usecase, where the data cube
> size
> > > is
> > > > >> >110Mb with 5 Million Records, the query for full data takes
> around
> > a
> > > > >> >minute
> > > > >> >or so which seems to be taking hell lot of time, even apart from
> > > this I
> > > > >> >was
> > > > >> >wondering as what is the query threshold that Kylin can handle in
> > > > >> >parallel.
> > > > >> >
> > > > >> >For instance, how many queries can be fired in parallel to our
> > > > >>aggregated
> > > > >> >data cubes and is there some practice which can gain the query
> > > > >> >performance.
> > > > >> >
> > > > >> >Urgent Call!
> > > > >> >
> > > > >> >Thanks!
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: Kylin Query Latency and Number of Parallel Queries

Reply via email to