[jira] [Comment Edited] (KYLIN-4010) Auto adjust offset according to query server's timezone for time derived column

Xiaoxiang Yu (Jira) Sun, 22 Sep 2019 01:37:49 -0700


    [ 
https://issues.apache.org/jira/browse/KYLIN-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935241#comment-16935241
 ]


Xiaoxiang Yu edited comment on KYLIN-4010 at 9/22/19 8:36 AM:
--------------------------------------------------------------

h2. Test

 !image-2019-09-22-16-35-23-663.png! 



{noformat}
2019-09-22 16:34:15,680 INFO  [stream-rpc-pool-t-thread-14] 
rpc.HttpStreamDataSearchClient:178 : send query to receiver cdh-worker-1:7181 
with query id:17665380-c270-6620-b23e-cdb05dd75948
2019-09-22 16:34:15,680 INFO  [stream-rpc-pool-t-thread-11] 
rpc.HttpStreamDataSearchClient:178 : send query to receiver cdh-worker-2:7181 
with query id:17665380-c270-6620-b23e-cdb05dd75948
2019-09-22 16:34:15,680 INFO  [stream-rpc-pool-t-thread-7] 
rpc.HttpStreamDataSearchClient:178 : send query to receiver cdh-client:7292 
with query id:17665380-c270-6620-b23e-cdb05dd75948
2019-09-22 16:34:16,114 INFO  [stream-rpc-pool-t-thread-11] 
rpc.HttpStreamDataSearchClient:189 : 
query-17665380-c270-6620-b23e-cdb05dd75948: receive response from 
cdh-worker-2:7181 take time:433
2019-09-22 16:34:16,115 INFO  [stream-rpc-pool-t-thread-11] 
rpc.HttpStreamDataSearchClient:194 : 
query-17665380-c270-6620-b23e-cdb05dd75948: receiver cdh-worker-2:7181 profile 
info:
start:          9
segments num:   1
segments:       [20190922081000_20190922082000]
skip segments:  [20190922080000_20190922081000]
total files:    8
total file size:37343675
scan rows:      270195
filter rows:    0
final rows:      10


2019-09-22 16:34:16,116 DEBUG [stream-rpc-pool-t-thread-11] 
util.CompressionUtils:71 : Original: 57037 bytes. Decompressed: 59689 bytes. 
Time: 0
2019-09-22 16:34:16,126 INFO  [stream-rpc-pool-t-thread-14] 
rpc.HttpStreamDataSearchClient:189 : 
query-17665380-c270-6620-b23e-cdb05dd75948: receive response from 
cdh-worker-1:7181 take time:445
2019-09-22 16:34:16,127 INFO  [stream-rpc-pool-t-thread-14] 
rpc.HttpStreamDataSearchClient:194 : 
query-17665380-c270-6620-b23e-cdb05dd75948: receiver cdh-worker-1:7181 profile 
info:
start:          8
segments num:   1
segments:       [20190922081000_20190922082000]
skip segments:  [20190922080000_20190922081000]
total files:    8
total file size:45984727
scan rows:      316123
filter rows:    0
final rows:      9


2019-09-22 16:34:16,131 DEBUG [stream-rpc-pool-t-thread-14] 
util.CompressionUtils:71 : Original: 72419 bytes. Decompressed: 75468 bytes. 
Time: 2
2019-09-22 16:34:16,186 INFO  [stream-rpc-pool-t-thread-7] 
rpc.HttpStreamDataSearchClient:189 : 
query-17665380-c270-6620-b23e-cdb05dd75948: receive response from 
cdh-client:7292 take time:504
2019-09-22 16:34:16,187 INFO  [stream-rpc-pool-t-thread-7] 
rpc.HttpStreamDataSearchClient:194 : 
query-17665380-c270-6620-b23e-cdb05dd75948: receiver cdh-client:7292 profile 
info:
start:          6
segments num:   1
segments:       [20190922081000_20190922082000]
skip segments:  [20190922080000_20190922081000]
total files:    8
total file size:51705975
scan rows:      372132
filter rows:    0
final rows:      8


2019-09-22 16:34:16,188 DEBUG [stream-rpc-pool-t-thread-7] 
util.CompressionUtils:71 : Original: 61637 bytes. Decompressed: 64188 bytes. 
Time: 0
2019-09-22 16:34:16,192 INFO  [Query 17665380-c270-6620-b23e-cdb05dd75948-60] 
service.QueryService:1152 : Processed rows for each storageContext: 0
2019-09-22 16:34:16,192 INFO  [Query 17665380-c270-6620-b23e-cdb05dd75948-60] 
service.QueryService:491 : Stats of SQL response: isException: false, duration: 
670, total scan count 0
2019-09-22 16:34:16,193 DEBUG [Query 17665380-c270-6620-b23e-cdb05dd75948-60] 
service.QueryService:502 : Shutdown query cache for realtime.
2019-09-22 16:34:16,193 DEBUG [Query 17665380-c270-6620-b23e-cdb05dd75948-60] 
util.CheckUtil:40 : query cache is disabled
2019-09-22 16:34:16,193 INFO  [Query 17665380-c270-6620-b23e-cdb05dd75948-60] 
service.QueryService:363 :
==========================[QUERY]===============================
Query Id: 17665380-c270-6620-b23e-cdb05dd75948
SQL: select MINUTE_START,
    count(*) as m1,
    count(distinct pageview_id) as pv_id,
    count(distinct str_minute) as str_minute,
    count(distinct str_second) as str_second,
    count(distinct str_minute_second) as str_minute_second
from T1
where MINUTE_START >= '2019-09-22 16:10:00' and MINUTE_START <= '2019-09-22 
16:19:00'
group by MINUTE_START
order by MINUTE_START
limit 100
User: ADMIN
Success: true
Duration: 0.671
Project: 300beta
Realization Names: [CUBE[name=TEST_REALTIME_DICT]]
Cuboid Ids: [1]
Total scan count: 0
Total scan bytes: 0
Result row count: 10
Accept Partial: true
Is Partial Result: false
Hit Exception Cache: false
Storage cache used: false
Is Query Push-Down: false
Is Prepare: false
Trace URL: null
Message: null
=
{noformat}
=========================[QUERY]===============================



was (Author: hit_lacus):
h2. Test

> Auto adjust offset according to query server's timezone for time derived 
> column
> -------------------------------------------------------------------------------
>
>                 Key: KYLIN-4010
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4010
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Others
>    Affects Versions: v3.0.0-alpha
>            Reporter: zengrui
>            Assignee: Xiaoxiang Yu
>            Priority: Minor
>             Fix For: v3.0.0-beta
>
>         Attachments: image-2019-07-15-17-15-31-209.png, 
> image-2019-07-15-17-17-04-029.png, image-2019-07-15-17-17-39-568.png, 
> image-2019-09-22-16-35-23-663.png
>
>
> h2. Backgroud
> In realtime OLAP, we index real-time event in streaming receiver. We know 
> that each event must contains a timestamp column (we often call it event 
> time), that value should represent when this event was produced. Because 
> event maybe come from different timezone and use local timezone is always 
> *error-prone*, so we recommend to use a {color:#DE350B}GMT+0{color} 
> timestamp(System.currentTimeMillis()) to avoid such issue.
> I think this is good by design, it is easy to understand and always correct. 
> But the *side effect* is that, the end user(business manager behind a BI 
> tools) are unhappy because he have to use GMT+0 with date/time related filter 
> in SQL and should understand the result should be *shifted* with his local 
> timezone. It is not user-firendly and inconvenient for normal user. Because 
> user may compare query result from different data source and compare them and 
> summarize, use GMT+0 may trouble them.
> h2. Example
> For example, kylin user work in *GMT+8* (maybe in Shanghai) want to know some 
> metrics which occured from {color:#DE350B}2019-09-01 12:00:00{color} to 
> {color:#DE350B}2019-09-01 14:00:00{color} in his {color:#DE350B}local 
> timezone{color}, so he has to {color:#DE350B}rewrite{color} his query (with 
> eight hour offset) to following: 
> {code:sql}
> select hour_start, count(*)
> from realtime_table
> where hour_start >= "2019-09-01 04:00:00" and hour_start < "2019-09-01 
> 06:00:00"  
> group by hour_start
> {code}
> And he will get result like :
> ||hour_start ||count||
> |2019-09-01 04:00:00  |139202|
> |2019-09-01 05:00:00  |89398|
> And he must convert to a more meaningful result in his mind, it is realy 
> annoying!
> ||hour_start ||count||
> |2019-09-01 12:00:00  |139202|
> |2019-09-01 13:00:00  |89398|
> h2. Desgin
> We should not change the way receiver index event, event time should be 
> stored in UTC timestamp. We should auto rewrite sql's event time related 
> filter. 
> In kylin, filter condition in where clause will be convert to a 
> *TupleFilter*, and it looks like *RelNode* in Apache Calicate.
> For where hour_start >= "2019-09-01 12:00:00" and hour_start < "2019-09-01 
> 14:00:00", we will send TupleFilter to streaming receiver or region server 
> which looks like this:
> {noformat}
> AND
>   GreatThanOrEqual
>     hout_start
>     CAST
>       "2019-09-01 12:00:00"
>       timestamp
>   LessThanOrEqual
>     hout_start
>     CAST
>       "2019-09-01 14:00:00"
>       timestamp
> {noformat}
> But for streaming query, we want to change each ConstantTupleFilter and minus 
> value for that timestamp. So the TupleFilter which be sent will be following:
> {noformat}
> AND
>   GreatThanOrEqual
>     hout_start
>     CAST
>       "2019-09-01 04:00:00"
>       timestamp
>   LessThanOrEqual
>     hout_start
>     CAST
>       "2019-09-01 06:00:00"
>       timestamp
> {noformat}
> Before query result processed by *OLAPEnumerator*,  kylin will plus each 
> value of time derived column, thus protect row from be filtered by calcite 
> generated code.
> So, user will get what he want in his timezone without any burden.
> h2. How to use
> To enable auto shift by time zone, please set 
> {color:#DE350B}kylin.stream.auto.just.by.timezone{color} to true.
> You can specific time zone by {color:#DE350B}kylin.web.timezone{color}, 
> otherwise, time zone will be auto detected.
> Only *time derived column* will be affected.
> h2. Related Issue
> Originally, the event time can only in the format of a long value (UTC 
> timestamp). But in some case, the event time is in a format of "yyyy-MM-dd 
> HH:mm:ss", we use a new class DateTimeParser(introduced in KYLIN-4001) to 
> convert such format into a UTC timestamp.
> h3. Old Describe
> In Real-Time Streaming Cube when I send some records to kafka topic, the 
> tmestamp for the record is 2019-01-01 00:00:00.000, but kylin create a 
> segment named 20181231160000_20181231170000.
> Then I found that TimeZone is hard-coded to "GMT" in function makeSegmentName 
> for class CubeSegment. I think that it should be config in kylin.properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (KYLIN-4010) Auto adjust offset according to query server's timezone for time derived column

Reply via email to