Sean-Gu commented on a change in pull request #982: Realtime doc for lambda mode
URL: https://github.com/apache/kylin/pull/982#discussion_r357548731
##########
File path: website/_docs30/tutorial/lambda_mode_and_timezone_realtime_olap.md
##########
@@ -0,0 +1,174 @@
+---
+layout: docs30
+title: Lambda mode and Timezone in Real-time OLAP
+categories: tutorial
+permalink: /docs30/tutorial/lambda_mode_and_timezone_realtime_olap.html
+---
+
+Kylin v3.0.0 will release the real-time OLAP function, by the power of new
added streaming reciever cluster, Kylin can query streaming data with
sub-second latency. You can check [this tech
blog](/blog/2019/04/12/rt-streaming-design/) for the overall design and core
concept.
+
+If you want to find a step by step tutorial, please check this [this tech
blog](/docs30/tutorial/realtime_olap.html).
+In this article, we will introduce how to update segment and set timezone in
for derived time column in realtime OLAP cube.
+
+# Background
+
+Says we have Kafka message which look like this:
+
+{% highlight Groff markup %}
+{
+ "s_nation":"SAUDI ARABIA",
+ "lo_supplycost":74292,
+ "p_category":"MFGR#0910",
+ "local_day_hour_minute":"09_21_44",
+ "event_time":"2019-12-09 08:44:50.000-0500",
+ "local_day_hour":"09_21",
+ "lo_quantity":12,
+ "lo_revenue":1411548,
+ "p_brand":"MFGR#0910051",
+ "s_region":"MIDDLE EAST",
+ "lo_discount":5,
+ "customer_info":{
+ "CITY":"CHINA 057",
+ "REGION":"ASIA",
+ "street":"CHINA 05721",
+ "NATION":"CHINA"
+ },
+ "d_year":1994,
+ "d_weeknuminyear":30,
+ "p_mfgr":"MFGR#09",
+ "v_revenue":7429200,
+ "d_yearmonth":"Jul1994",
+ "s_city":"SAUDI ARA15",
+ "profit_ratio":0.05263157894736842,
+ "d_yearmonthnum":199407,
+ "round":1
+}
+{% endhighlight %}
+
+In this sample, it is come from SSB with some additional field such as
*event_time*. We have the field *event_time* as the timestamp of current event.
+And we assumed that event came from countries of different timezone,
"2019-12-09 08:44:50.000-0500" indicated that this a event which come from
'America/New_York' timezone. You may have some events which come from
'Asia/Shanghai' as well.
+
+*local_day_hour_minute* is a column which value is in local timezone, in this
sample it in "GMT+8".
+
+### Question
+We want to do some realtime OLAP analysis, so you may consider to use Realtime
OLAP. But you may have some concerns which included:
+
+1. In the fact that events are come from different timezone, you may worried
will this cause some trouble or incorrect query result?
+2. In some cases, kafka message contains the value which is not actually what
you want, says some dimension value is misspelled, how could you make
corrections? (Or you want to retrieve some long-late-message which was dropped.)
+3. My query only hit a small range of time range, how should I write filter
condition make sure unused segments purged/skipped from scan?
Review comment:
time range -> time
make -> to make
are purged/skipped
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services