Re: 回复： Kylin Real time

Li Yang Mon, 21 Sep 2015 02:37:43 -0700

Gas is mostly right, with one addition that, query can hit both
inverted-index and cube if it asks for both latest and historic data. The
result from two sources will get aggregated at query time.


On Fri, Sep 18, 2015 at 11:26 PM, Gaspare Maria <
[email protected]> wrote:

> Hi,
>
> so if I understood the idea behind Kylin Real Time is:
>
>  *   Inverted Indexes (maybe Lucene or inverted indexes on HBase) will
>    be built according to CUBE Schema in near-realtime by using Spark
>    (streaming) Kafka Consumers;
>  * On query Time if the query impacts latest data it will be routed to
>    Inverted Indexes otherwise on the CUBE on HBase.
>  * Query that impacts latest data should be limited due to limitation
>    of inverted indexes;
>  * Query on long period of time back (e.g. from now back to 2 months
>    ago) will be routed part on HBase and part on Inverted Indexes.
>
>
> Am I right?
>
> Regards,
>
> -- gas
>
>
>
> On 09/18/2015 12:35 AM, Henry Saputra wrote:
>
>> Awesome, thanks Luke
>>
>> On Thu, Sep 17, 2015 at 2:37 AM, Luke Han <[email protected]> wrote:
>>
>>> Here's JIRA: https://issues.apache.org/jira/browse/KYLIN-599
>>>
>>>
>>> Best Regards!
>>> ---------------------
>>>
>>> Luke Han
>>>
>>> On Thu, Sep 17, 2015 at 1:09 AM, Henry Saputra <[email protected]>
>>> wrote:
>>>
>>> That is good to know. Li Yang, Luke, could one of you share the design
>>>> document for this realtime OLAP query in the JIRA?
>>>>
>>>> Thanks,
>>>>
>>>> - Henry
>>>>
>>>> On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <[email protected]> wrote:
>>>>
>>>>> There will be incremental updates on the existing cubes, but during
>>>>>> that updates I suppose no queries will be ran against them?
>>>>>>
>>>>> Yes, it's mini batch, usually at minutes interval. And of course cube
>>>>> CAN
>>>>> serve query while the mini incremental is under built. How can we let
>>>>> the
>>>>> cube offline every few minutes, that's impossible.  :-)
>>>>>
>>>>> On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <[email protected]> wrote:
>>>>>
>>>>> Inverted index? That sounds interesting. We use inverted index to serve
>>>>>>
>>>>> the
>>>>
>>>>> cubes in our internal implementation.
>>>>>>
>>>>>> I come from Big Data Center of excellence from an Indian IT major.
>>>>>>
>>>>>> We have been experimenting with the idea of serving cubes through
>>>>>> ElasticSearch REST API. This is not related to Kylin. This is our own
>>>>>> internal development.
>>>>>>
>>>>>> The motivation for this is --- Once the cube is built, it needs to be
>>>>>> served.
>>>>>>
>>>>>> The query looks somewhat like this:
>>>>>>
>>>>>> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
>>>>>>
>>>>>> "Given ProductID=XX, Fetch how much it has sold every Month"
>>>>>>
>>>>>> Find all entries that match K1=V1, K2=V2
>>>>>>
>>>>>> This relieves us from lot of things - storage, REST API etc. and makes
>>>>>>
>>>>> the
>>>>
>>>>> cubes easily searchable.
>>>>>>
>>>>>> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
>>>>>> experimenting with Web-Data-Connector which we believe can be used for
>>>>>> Visualization... Apart from that, we experimented with a few
>>>>>> auto-generated Kibana dashboards which were just okay. But Kibana was
>>>>>>
>>>>> not
>>>>
>>>>> designed for Cubes and so it has its own limitations.
>>>>>>
>>>>>> Appreciate any feedback!
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Sarnath
>>>>>> I also think that it's a mini batch cubing.   It's time to bring back
>>>>>>
>>>>> the
>>>>
>>>>> inverted index into roadmap. The inverted index will be the true
>>>>>>
>>>>> real-time
>>>>
>>>>> solution and can provide the low-level query capability on the raw
>>>>>> data.
>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>> JiangXu
>>>>>>
>>>>>>
>>>>>> ------------------ 原始邮件 ------------------
>>>>>> 发件人: "Henry Saputra";<[email protected]>;
>>>>>> 发送时间: 2015年9月15日(星期二) 中午12:39
>>>>>> 收件人: "[email protected]"<[email protected]
>>>>>> >;
>>>>>>
>>>>>> 主题: Re: Kylin Real time
>>>>>>
>>>>>>
>>>>>>
>>>>>> Ok, but that still seems like mini batch to me.
>>>>>>
>>>>>> There will be incremental updates on the existing cubes, but during
>>>>>> that updates I suppose no queries will be ran against them?
>>>>>>
>>>>>> - Henry
>>>>>>
>>>>>> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <[email protected]> wrote:
>>>>>>
>>>>>>> Streaming OLAP provides Near-Realtime analysis where data delay can
>>>>>>>
>>>>>> be as
>>>>
>>>>> short as a few minutes.
>>>>>>>
>>>>>>> Traditional daily build allows user to analyze yesterday's data. If
>>>>>>> increase the frequency to hourly, then user can analyze last hour's
>>>>>>>
>>>>>> data.
>>>>
>>>>> Further down the line, how about incremental build every 5 minutes
>>>>>>>
>>>>>> from a
>>>>
>>>>> streaming source? Then user can analyze data 5 minutes ago. That's
>>>>>>> Streaming OLAP!
>>>>>>>
>>>>>>> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
>>>>>>>
>>>>>> [email protected]
>>>>
>>>>> wrote:
>>>>>>>
>>>>>>> Hi Luke,
>>>>>>>>
>>>>>>>> Could you clarify again what is the streaming OLAP means here?
>>>>>>>>
>>>>>>>> By definition OLAP work with historical data.
>>>>>>>>
>>>>>>>> Maybe I missed it but was there any discussions or proposed design
>>>>>>>>
>>>>>>> for
>>>>
>>>>> it?
>>>>>>
>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> - Henry
>>>>>>>>
>>>>>>>> On Monday, August 3, 2015, Luke Han <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Hi Siddharth,
>>>>>>>>>      Kylin's next majority release (0.8.x) will support Streaming
>>>>>>>>>
>>>>>>>> OLAP
>>>>
>>>>> which
>>>>>>>>
>>>>>>>>> will coming in Q4 since it still under development now, as Hongbin
>>>>>>>>> mentioned above.
>>>>>>>>>      Could  you please drop me a mail about your case? I would like
>>>>>>>>>
>>>>>>>> to
>>>>
>>>>> better understand your scenario to well manage coming features?
>>>>>>>>>
>>>>>>>>>      Thanks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best Regards!
>>>>>>>>> ---------------------
>>>>>>>>>
>>>>>>>>> Luke Han
>>>>>>>>>
>>>>>>>>> On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <[email protected]
>>>>>>>>> <javascript:;>> wrote:
>>>>>>>>>
>>>>>>>>> For current 0.7  releases, you cannot.
>>>>>>>>>>
>>>>>>>>>> Real time data processing and querying will be added in 0.8
>>>>>>>>>>
>>>>>>>>> release.
>>>>
>>>>> It
>>>>>>
>>>>>>> is
>>>>>>>>>
>>>>>>>>>> still under development and testing. We have achieved good
>>>>>>>>>>
>>>>>>>>> progress
>>>>
>>>>> on
>>>>>>
>>>>>>> it,
>>>>>>>>>
>>>>>>>>>> please wait for announcements.
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
>>>>>>>>>> [email protected] <javascript:;>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi ,
>>>>>>>>>>>
>>>>>>>>>>> I would like to ask whether Kylin can be used as a real time
>>>>>>>>>>>
>>>>>>>>>> querying
>>>>>>
>>>>>>> system?
>>>>>>>>>>> The process of building a cube , makes it look like a batch
>>>>>>>>>>>
>>>>>>>>>> process
>>>>>>
>>>>>>> after
>>>>>>>>>
>>>>>>>>>> which the queries are with low latency.. however can
>>>>>>>>>>> We get a real time idea of what the OLAP system's state is at
>>>>>>>>>>>
>>>>>>>>>> the
>>>>
>>>>> query
>>>>>>>>
>>>>>>>>> instance?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Siddharth
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> *Bin Mahone | 马洪宾*
>>>>>>>>>> Apache Kylin: http://kylin.io
>>>>>>>>>> Github: https://github.com/binmahone
>>>>>>>>>>
>>>>>>>>>>
>

Re: 回复： Kylin Real time

Reply via email to