Re: 回复： Kylin Real time

Gaspare Maria Fri, 18 Sep 2015 08:28:35 -0700

Hi,

so if I understood the idea behind Kylin Real Time is:


 *   Inverted Indexes (maybe Lucene or inverted indexes on HBase) will
   be built according to CUBE Schema in near-realtime by using Spark
   (streaming) Kafka Consumers;
 * On query Time if the query impacts latest data it will be routed to
   Inverted Indexes otherwise on the CUBE on HBase.
 * Query that impacts latest data should be limited due to limitation
   of inverted indexes;
 * Query on long period of time back (e.g. from now back to 2 months
   ago) will be routed part on HBase and part on Inverted Indexes.


Am I right?

Regards,

-- gas


On 09/18/2015 12:35 AM, Henry Saputra wrote:

Awesome, thanks Luke

On Thu, Sep 17, 2015 at 2:37 AM, Luke Han <[email protected]> wrote:

Here's JIRA: https://issues.apache.org/jira/browse/KYLIN-599


Best Regards!
---------------------

Luke Han

On Thu, Sep 17, 2015 at 1:09 AM, Henry Saputra <[email protected]>
wrote:

That is good to know. Li Yang, Luke, could one of you share the design
document for this realtime OLAP query in the JIRA?

Thanks,

- Henry

On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <[email protected]> wrote:

There will be incremental updates on the existing cubes, but during
that updates I suppose no queries will be ran against them?

Yes, it's mini batch, usually at minutes interval. And of course cube CAN
serve query while the mini incremental is under built. How can we let the
cube offline every few minutes, that's impossible.  :-)

On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <[email protected]> wrote:

Inverted index? That sounds interesting. We use inverted index to serve

the

cubes in our internal implementation.

I come from Big Data Center of excellence from an Indian IT major.

We have been experimenting with the idea of serving cubes through
ElasticSearch REST API. This is not related to Kylin. This is our own
internal development.

The motivation for this is --- Once the cube is built, it needs to be
served.

The query looks somewhat like this:

"Given ProductID=*, Year=2015, Fetch All Quantities Sold"

"Given ProductID=XX, Fetch how much it has sold every Month"

Find all entries that match K1=V1, K2=V2

This relieves us from lot of things - storage, REST API etc. and makes

the

cubes easily searchable.

However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
experimenting with Web-Data-Connector which we believe can be used for
Visualization... Apart from that, we experimented with a few
auto-generated Kibana dashboards which were just okay. But Kibana was

not

designed for Cubes and so it has its own limitations.

Appreciate any feedback!

Thanks,

Best,

Sarnath
I also think that it's a mini batch cubing.   It's time to bring back

the

inverted index into roadmap. The inverted index will be the true

real-time

solution and can provide the low-level query capability on the raw data.


Thanks!
JiangXu


------------------ 原始邮件 ------------------
发件人: "Henry Saputra";<[email protected]>;
发送时间: 2015年9月15日(星期二) 中午12:39
收件人: "[email protected]"<[email protected]>;

主题: Re: Kylin Real time



Ok, but that still seems like mini batch to me.

There will be incremental updates on the existing cubes, but during
that updates I suppose no queries will be ran against them?

- Henry

On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <[email protected]> wrote:

Streaming OLAP provides Near-Realtime analysis where data delay can

be as

short as a few minutes.

Traditional daily build allows user to analyze yesterday's data. If
increase the frequency to hourly, then user can analyze last hour's

data.

Further down the line, how about incremental build every 5 minutes

from a

streaming source? Then user can analyze data 5 minutes ago. That's
Streaming OLAP!

On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <

[email protected]

wrote:

Hi Luke,

Could you clarify again what is the streaming OLAP means here?

By definition OLAP work with historical data.

Maybe I missed it but was there any discussions or proposed design

for

it?

Thanks,

- Henry

On Monday, August 3, 2015, Luke Han <[email protected]> wrote:

Hi Siddharth,
     Kylin's next majority release (0.8.x) will support Streaming

OLAP

which

will coming in Q4 since it still under development now, as Hongbin
mentioned above.
     Could  you please drop me a mail about your case? I would like

to

better understand your scenario to well manage coming features?

     Thanks.




Best Regards!
---------------------

Luke Han

On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <[email protected]
<javascript:;>> wrote:

For current 0.7  releases, you cannot.

Real time data processing and querying will be added in 0.8

release.

It

is

still under development and testing. We have achieved good

progress

on

it,

please wait for announcements.

On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
[email protected] <javascript:;>> wrote:

Hi ,

I would like to ask whether Kylin can be used as a real time

querying

system?
The process of building a cube , makes it look like a batch

process

after

which the queries are with low latency.. however can
We get a real time idea of what the OLAP system's state is at

the

query

instance?

Thanks,
Siddharth



--
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: 回复： Kylin Real time

Reply via email to