Hi gaspare,
You have raised a great discussion about those things.
As orignial idea, there's only cube, but we come up a new concept: Data
Model since "Cube" itself is just one storage.
There's one option for modelor to define/pickup which kind of storage
for the Data Model, actually we call it
as Realization interface for Cube, Streaming and Inverted Index
and extensible for any others in the future.
So you are right, there's will be one UI setting part for Data Model for
this which will come later since 2.x is under heavy refactoring and
turning, just like Hongbin mentioned.
Please stay tuned for the latest update of streaming/realtime
capability of Kylin.
Thanks.
Luke
Best Regards!
---------------------
Luke Han
On Wed, Sep 23, 2015 at 2:55 PM, hongbin ma <[email protected]> wrote:
> hi gaspare
>
> Actually we do have a similar solutions in the 2.x-staging code base. It is
> called "Streaming Cubing" (In contrast to Inverted Index, it is using a
> mini batch cubing solution to tackle the near real time problem)
>
> There will be daemon threads that starts up periodically to consume data
> from the data batch (maybe five-minute batch) from Kafka, and build a
> mini-cube in memory before writing it into HBase. We have not officially
> announced the functionality because:
>
> 1. Currently we do not have front end UI to do the configurations,
> including specifying Kafka configurations, etc. This makes Streaming
> Cubing difficult to use now. The good news is that we're actively working
> on it (https://issues.apache.org/jira/browse/KYLIN-1041)
> 2. Lack of Documentation
> 3. Currently we have not leveraged spark streaming(or other alternatives)
> to process the data batch. Our daemon thread is a simple java thread and it
> will be problematic when the data batch grows too large. We intended to
> migrate to horizontal scalable solutions like spark streaming, but havn't
> got enough bandwidth to start it.(
> https://issues.apache.org/jira/browse/KYLIN-1042)
>
> Anyway customers should be able to use Streaming cubing when we officially
> annnouce 2.x versions.
>
>
>
>
>
> On Wed, Sep 23, 2015 at 6:00 AM, Gaspare Maria <
> [email protected]> wrote:
>
> > Hi,
> >
> > one more question/feedback regarding Kylin Real time.
> >
> > There are many use-cases (in particular in the TELCO environment) where
> > stream of data arrive at regular intervals (usually every 5 or 15
> minutes)
> > and "real-time" aggregations could be always done per intervals (for
> > example SUM(upLink) per node in the last interval). In such use-cases the
> > "maybe" the CUBE could be update in near real-time from after
> > pre-aggregation with Spark Streaming (of course without create the HFiles
> > but using parallel PUT on HBase with Spark). According to our experience
> > for "simple" CUBEs this should be faster then Inverted Indexes.
> >
> > Of course there are use-cases where this approach is not applicable, in
> > those cases Inverted Indexes are still valid.
> >
> > Should be good if Kylin will be able to give to the "CUBE Administrator"
> > the possibility to choose how to do "Real-time CUBE Update". For example,
> > give the option to choose wither "Inverted Indexes" or "HBase".
> >
> > Do you think a such approach could be applicable to Kylin ?
> >
> > Regards,
> >
> > -- gas
> >
> >
> >
> > On 09/21/2015 11:36 AM, Li Yang wrote:
> >
> >> Gas is mostly right, with one addition that, query can hit both
> >> inverted-index and cube if it asks for both latest and historic data.
> The
> >> result from two sources will get aggregated at query time.
> >>
> >> On Fri, Sep 18, 2015 at 11:26 PM, Gaspare Maria <
> >> [email protected]> wrote:
> >>
> >> Hi,
> >>>
> >>> so if I understood the idea behind Kylin Real Time is:
> >>>
> >>> * Inverted Indexes (maybe Lucene or inverted indexes on HBase) will
> >>> be built according to CUBE Schema in near-realtime by using Spark
> >>> (streaming) Kafka Consumers;
> >>> * On query Time if the query impacts latest data it will be routed to
> >>> Inverted Indexes otherwise on the CUBE on HBase.
> >>> * Query that impacts latest data should be limited due to limitation
> >>> of inverted indexes;
> >>> * Query on long period of time back (e.g. from now back to 2 months
> >>> ago) will be routed part on HBase and part on Inverted Indexes.
> >>>
> >>>
> >>> Am I right?
> >>>
> >>> Regards,
> >>>
> >>> -- gas
> >>>
> >>>
> >>>
> >>> On 09/18/2015 12:35 AM, Henry Saputra wrote:
> >>>
> >>> Awesome, thanks Luke
> >>>>
> >>>> On Thu, Sep 17, 2015 at 2:37 AM, Luke Han <[email protected]> wrote:
> >>>>
> >>>> Here's JIRA: https://issues.apache.org/jira/browse/KYLIN-599
> >>>>>
> >>>>>
> >>>>> Best Regards!
> >>>>> ---------------------
> >>>>>
> >>>>> Luke Han
> >>>>>
> >>>>> On Thu, Sep 17, 2015 at 1:09 AM, Henry Saputra <
> >>>>> [email protected]>
> >>>>> wrote:
> >>>>>
> >>>>> That is good to know. Li Yang, Luke, could one of you share the
> design
> >>>>>
> >>>>>> document for this realtime OLAP query in the JIRA?
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> - Henry
> >>>>>>
> >>>>>> On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <[email protected]>
> wrote:
> >>>>>>
> >>>>>> There will be incremental updates on the existing cubes, but during
> >>>>>>>
> >>>>>>>> that updates I suppose no queries will be ran against them?
> >>>>>>>>
> >>>>>>>> Yes, it's mini batch, usually at minutes interval. And of course
> >>>>>>> cube
> >>>>>>> CAN
> >>>>>>> serve query while the mini incremental is under built. How can we
> let
> >>>>>>> the
> >>>>>>> cube offline every few minutes, that's impossible. :-)
> >>>>>>>
> >>>>>>> On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <[email protected]>
> wrote:
> >>>>>>>
> >>>>>>> Inverted index? That sounds interesting. We use inverted index to
> >>>>>>> serve
> >>>>>>> the
> >>>>>>> cubes in our internal implementation.
> >>>>>>>
> >>>>>>>> I come from Big Data Center of excellence from an Indian IT major.
> >>>>>>>>
> >>>>>>>> We have been experimenting with the idea of serving cubes through
> >>>>>>>> ElasticSearch REST API. This is not related to Kylin. This is our
> >>>>>>>> own
> >>>>>>>> internal development.
> >>>>>>>>
> >>>>>>>> The motivation for this is --- Once the cube is built, it needs to
> >>>>>>>> be
> >>>>>>>> served.
> >>>>>>>>
> >>>>>>>> The query looks somewhat like this:
> >>>>>>>>
> >>>>>>>> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
> >>>>>>>>
> >>>>>>>> "Given ProductID=XX, Fetch how much it has sold every Month"
> >>>>>>>>
> >>>>>>>> Find all entries that match K1=V1, K2=V2
> >>>>>>>>
> >>>>>>>> This relieves us from lot of things - storage, REST API etc. and
> >>>>>>>> makes
> >>>>>>>>
> >>>>>>>> the
> >>>>>>> cubes easily searchable.
> >>>>>>>
> >>>>>>>> However, we don't do SQL/MDX on top of it. Tableau 9.1Beta is
> >>>>>>>> experimenting with Web-Data-Connector which we believe can be used
> >>>>>>>> for
> >>>>>>>> Visualization... Apart from that, we experimented with a few
> >>>>>>>> auto-generated Kibana dashboards which were just okay. But Kibana
> >>>>>>>> was
> >>>>>>>>
> >>>>>>>> not
> >>>>>>> designed for Cubes and so it has its own limitations.
> >>>>>>>
> >>>>>>>> Appreciate any feedback!
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>>
> >>>>>>>> Sarnath
> >>>>>>>> I also think that it's a mini batch cubing. It's time to bring
> >>>>>>>> back
> >>>>>>>>
> >>>>>>>> the
> >>>>>>> inverted index into roadmap. The inverted index will be the true
> >>>>>>> real-time
> >>>>>>> solution and can provide the low-level query capability on the raw
> >>>>>>>
> >>>>>>>> data.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks!
> >>>>>>>> JiangXu
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ------------------ 原始邮件 ------------------
> >>>>>>>> 发件人: "Henry Saputra";<[email protected]>;
> >>>>>>>> 发送时间: 2015年9月15日(星期二) 中午12:39
> >>>>>>>> 收件人: "[email protected]"<
> >>>>>>>> [email protected]
> >>>>>>>>
> >>>>>>>>> ;
> >>>>>>>>>
> >>>>>>>> 主题: Re: Kylin Real time
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Ok, but that still seems like mini batch to me.
> >>>>>>>>
> >>>>>>>> There will be incremental updates on the existing cubes, but
> during
> >>>>>>>> that updates I suppose no queries will be ran against them?
> >>>>>>>>
> >>>>>>>> - Henry
> >>>>>>>>
> >>>>>>>> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <[email protected]>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Streaming OLAP provides Near-Realtime analysis where data delay
> can
> >>>>>>>>>
> >>>>>>>>> be as
> >>>>>>>>
> >>>>>>> short as a few minutes.
> >>>>>>>
> >>>>>>>> Traditional daily build allows user to analyze yesterday's data.
> If
> >>>>>>>>> increase the frequency to hourly, then user can analyze last
> hour's
> >>>>>>>>>
> >>>>>>>>> data.
> >>>>>>>>
> >>>>>>> Further down the line, how about incremental build every 5 minutes
> >>>>>>>
> >>>>>>>> from a
> >>>>>>>>
> >>>>>>> streaming source? Then user can analyze data 5 minutes ago. That's
> >>>>>>>
> >>>>>>>> Streaming OLAP!
> >>>>>>>>>
> >>>>>>>>> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
> >>>>>>>>>
> >>>>>>>>> [email protected]
> >>>>>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Luke,
> >>>>>>>>>
> >>>>>>>>>> Could you clarify again what is the streaming OLAP means here?
> >>>>>>>>>>
> >>>>>>>>>> By definition OLAP work with historical data.
> >>>>>>>>>>
> >>>>>>>>>> Maybe I missed it but was there any discussions or proposed
> design
> >>>>>>>>>>
> >>>>>>>>>> for
> >>>>>>>>>
> >>>>>>>> it?
> >>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>>> - Henry
> >>>>>>>>>>
> >>>>>>>>>> On Monday, August 3, 2015, Luke Han <[email protected]> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi Siddharth,
> >>>>>>>>>>
> >>>>>>>>>>> Kylin's next majority release (0.8.x) will support
> >>>>>>>>>>> Streaming
> >>>>>>>>>>>
> >>>>>>>>>>> OLAP
> >>>>>>>>>>
> >>>>>>>>> which
> >>>>>>>
> >>>>>>>> will coming in Q4 since it still under development now, as Hongbin
> >>>>>>>>>>> mentioned above.
> >>>>>>>>>>> Could you please drop me a mail about your case? I would
> >>>>>>>>>>> like
> >>>>>>>>>>>
> >>>>>>>>>>> to
> >>>>>>>>>>
> >>>>>>>>> better understand your scenario to well manage coming features?
> >>>>>>>
> >>>>>>>> Thanks.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Best Regards!
> >>>>>>>>>>> ---------------------
> >>>>>>>>>>>
> >>>>>>>>>>> Luke Han
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <
> >>>>>>>>>>> [email protected]
> >>>>>>>>>>> <javascript:;>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> For current 0.7 releases, you cannot.
> >>>>>>>>>>>
> >>>>>>>>>>>> Real time data processing and querying will be added in 0.8
> >>>>>>>>>>>>
> >>>>>>>>>>>> release.
> >>>>>>>>>>>
> >>>>>>>>>> It
> >>>>>>>
> >>>>>>>> is
> >>>>>>>>>
> >>>>>>>>>> still under development and testing. We have achieved good
> >>>>>>>>>>>>
> >>>>>>>>>>>> progress
> >>>>>>>>>>>
> >>>>>>>>>> on
> >>>>>>>
> >>>>>>>> it,
> >>>>>>>>>
> >>>>>>>>>> please wait for announcements.
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> >>>>>>>>>>>> [email protected] <javascript:;>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi ,
> >>>>>>>>>>>>
> >>>>>>>>>>>>> I would like to ask whether Kylin can be used as a real time
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> querying
> >>>>>>>>>>>>
> >>>>>>>>>>> system?
> >>>>>>>>>
> >>>>>>>>>> The process of building a cube , makes it look like a batch
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> process
> >>>>>>>>>>>>
> >>>>>>>>>>> after
> >>>>>>>>>
> >>>>>>>>>> which the queries are with low latency.. however can
> >>>>>>>>>>>>
> >>>>>>>>>>>>> We get a real time idea of what the OLAP system's state is at
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>
> >>>>>>>>>>> query
> >>>>>>>
> >>>>>>>> instance?
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Siddharth
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>> Regards,
> >>>>>>>>>>>>
> >>>>>>>>>>>> *Bin Mahone | 马洪宾*
> >>>>>>>>>>>> Apache Kylin: http://kylin.io
> >>>>>>>>>>>> Github: https://github.com/binmahone
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>