Stream the day in through, say Storm and make sure that the dimensions are
joined by querying HBase or KV store on the fly to get information about
your dimensions. These will provide high-speed lookups and are scalable and
thus you can avoid costly joins on data through SQL. Yahoo did a benchmark
of streaming engines like storm, Spark streaming and flink last Christmas.
The usecase they chose closely resembles this usecase. Check it out..

You can then land the joined data in HDFS and use impala or high speed SQL
for queries.
Kylin supports streaming aggregations as well I think. You may want to read
that up....

Cube build itself takes a lot of time. I am not sure how fast you can do
that. 2 sec can be extremely challenging. Please post back on how you
solved this problem.

Best,
Sarnath
On Jul 5, 2016 7:12 AM, "Santoshakhilesh" <[email protected]>
wrote:

>
>
> -----Original Message-----
> From: Santosh Akhilesh [mailto:[email protected]]
> Sent: 02 July 2016 17:55
> To: [email protected]
> Cc: Santoshakhilesh
> Subject: Few Questions about Kylin Ability
>
> Hi All ,
> Last year I had done a PoC for one of our products using Kylin. Our
> distributed architecture journey was on hold for some time but now we are
> back again to rearchitect our system to distributed. I am writing this mail
> to understand how and whether Kylin can fit in to our requirements.
> Let me give background of our requirement.
> Ours is a network performance management solution which needs to handle
> following scenes.
>
> 1. Collect data from network elements in granularity between 30 sec to 5
> minute period. Every period we collect around 150Million KPIs Which are
> distributed across different service type. The service types are model
> driven and can change over period of time.
> 2. Data which we collect needs to available for Adhoc and OLAP type query
> ASAP. For example data collected between 10:00 and 10:05 for 5 mins period
> should be available for reports to fire query by 10:06. Query will involve
> joining performance data with inventory data and also have filters like
> query data for Area = Area1 and we also need sort by KPI or property of
> inventory with order by Clause 3. We also need OLAP type query like group
> by area , province , country etc... and needs to apply sum , max , min ,
> avg aggregator. We also need to generate Top talkers report which means we
> need Top N function.
> 4. There will be background machine learning jobs which need to scan raw
> and aggregated data.
> 5. We would be generating around 5-10 TB of data every day and In future
> may be more.
> Now my questions are these. We need to retain data for several days and
> months based on aggregation period.
> 6. Adhoc and OLAP query from report should take < 2 seconds.
> So my questions are;
>
> 1. Which of the use cases Kylin can support?
> 2. How long cube building takes and how does it handle the data which will
> be appended every 30 sec or 5 minutes.
> 3. Can Kylin support both Adhoc query and OLAP query ?
>
> I have several other questions but I would like to initiate the discussion
> with these.
> We plan to start a test next week with Kylin I am just setting up a
> cluster now. We don't plan to use cloud era or Horton work sandbox as our
> company has its own sandbox.
>
> Appreciate response from Kylin experts.
>
> Regards
> Santosh
>
>
> Sent from my iPhone
>

Reply via email to