Sean-Gu commented on a change in pull request #982: Realtime doc for lambda mode
URL: https://github.com/apache/kylin/pull/982#discussion_r357552949
 
 

 ##########
 File path: website/_docs30/tutorial/lambda_mode_and_timezone_realtime_olap.md
 ##########
 @@ -0,0 +1,174 @@
+---
+layout: docs30
+title:  Lambda mode and Timezone in Real-time OLAP
+categories: tutorial
+permalink: /docs30/tutorial/lambda_mode_and_timezone_realtime_olap.html
+---
+
+Kylin v3.0.0 will release the real-time OLAP function, by the power of new 
added streaming reciever cluster, Kylin can query streaming data with 
sub-second latency. You can check [this tech 
blog](/blog/2019/04/12/rt-streaming-design/) for the overall design and core 
concept. 
+
+If you want to find a step by step tutorial, please check this [this tech 
blog](/docs30/tutorial/realtime_olap.html).
+In this article, we will introduce how to update segment and set timezone in 
for derived time column in realtime OLAP cube. 
+
+# Background
+
+Says we have Kafka message which look like this:
+
+{% highlight Groff markup %}
+{
+    "s_nation":"SAUDI ARABIA",
+    "lo_supplycost":74292,
+    "p_category":"MFGR#0910",
+    "local_day_hour_minute":"09_21_44",
+    "event_time":"2019-12-09 08:44:50.000-0500",
+    "local_day_hour":"09_21",
+    "lo_quantity":12,
+    "lo_revenue":1411548,
+    "p_brand":"MFGR#0910051",
+    "s_region":"MIDDLE EAST",
+    "lo_discount":5,
+    "customer_info":{
+        "CITY":"CHINA    057",
+        "REGION":"ASIA",
+        "street":"CHINA    05721",
+        "NATION":"CHINA"
+    },
+    "d_year":1994,
+    "d_weeknuminyear":30,
+    "p_mfgr":"MFGR#09",
+    "v_revenue":7429200,
+    "d_yearmonth":"Jul1994",
+    "s_city":"SAUDI ARA15",
+    "profit_ratio":0.05263157894736842,
+    "d_yearmonthnum":199407,
+    "round":1
+}
+{% endhighlight %}
+
+In this sample, it is come from SSB with some additional field such as 
*event_time*. We have the field *event_time* as the timestamp of current event. 
+And we assumed that event came from countries of different timezone, 
"2019-12-09 08:44:50.000-0500" indicated that this a event which come from 
'America/New_York' timezone. You may have some events which come from 
'Asia/Shanghai' as well.
+
+*local_day_hour_minute* is a column which value is in local timezone, in this 
sample it in "GMT+8".
+
+### Question
+We want to do some realtime OLAP analysis, so you may consider to use Realtime 
OLAP. But you may have some concerns which included:
+
+1. In the fact that events are come from different timezone, you may worried 
will this cause some trouble or incorrect query result?
+2. In some cases, kafka message contains the value which is not actually what 
you want, says some dimension value is misspelled, how could you make 
corrections? (Or you want to retrieve some long-late-message which was dropped.)
+3. My query only hit a small range of time range, how should I write filter 
condition make sure unused segments purged/skipped from scan?
+
+### Quick Answer
+Firstly, you can always get the correct result in the right timezone of your 
place. Just by set *kylin.stream.event.timezone=GMT+N* for all Kylin processes. 
By default, UTC is used for *derived time column*.
+
+Secondly, in fact you cannot update a normal streaming cube, but you can 
update a streaming cube which in lambda mode, all you need to prepare is 
creating a Hive table which mapping to your kafka event.
+
+Thirdly, yes it is, you can achieved this by add *derived time column* like 
*MINUTE_START*/*DAY_START* etc in your filter condition.
+
+# How to do
+
+### Configure timezone
+We knew message may come from different timezone, but you want the query 
result should stick to some specific timezone. 
 
 Review comment:
   knew -> know
   but you want query results using some specific timezone. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to