[ 
https://issues.apache.org/jira/browse/KYLIN-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4010:
--------------------------------
    Description: 
h2. Backgroud
In realtime OLAP, we index real-time event in streaming receiver. We know that 
each event must contains a timestamp column (we often call it event time), that 
value should represent when this event was produced. Because event maybe come 
from different timezone and use local timezone is always *error-prone*, so we 
recommend to use a {color:#DE350B}GMT+0{color} 
timestamp(System.currentTimeMillis()) to avoid such issue.

I think this is good by design, it is easy to understand and always correct. 
But the *side effect* is that, the end user(business manager behind a BI tools) 
are unhappy because he have to use GMT+0 with date/time related filter in SQL 
and should understand the result should be *shifted* with his local timezone. 
It is not user-firendly and inconvenient for normal user. Because user may 
compare query result from different data source and compare them and summarize, 
use GMT+0 may trouble them.

h2. Example
For example, kylin user work in *GMT+8* (maybe in Shanghai) want to know some 
metrics which occured from {color:#DE350B}2019-09-01 12:00:00{color} to 
{color:#DE350B}2019-09-01 14:00:00{color} in his {color:#DE350B}local 
timezone{color}, so he has to {color:#DE350B}rewrite{color} his query (with 
eight hour offset) to following: 


{code:sql}
select hour_start, count(*)
from realtime_table
where hour_start >= "2019-09-01 04:00:00" and hour_start < "2019-09-01 
06:00:00"  
group by hour_start
{code}


And he will get result like :

||hour_start ||count||
|2019-09-01 04:00:00    |139202|
|2019-09-01 05:00:00    |89398|

And he must convert to a more meaningful result in his mind, it is realy 
annoying!

||hour_start ||count||
|2019-09-01 12:00:00    |139202|
|2019-09-01 13:00:00    |89398|



h2. Desgin
We should not change the way receiver index event, event time should be stored 
in UTC timestamp. We should auto rewrite sql's event time related filter. 

In kylin, filter condition in where clause will be convert to a *TupleFilter*, 
and it looks like *RelNode* in Apache Calicate.
For where hour_start >= "2019-09-01 12:00:00" and hour_start < "2019-09-01 
14:00:00", we will send TupleFilter to streaming receiver or region server 
which looks like this:


{noformat}
AND
  GreatThanOrEqual
    hout_start
    CAST
      "2019-09-01 12:00:00"
      timestamp
  LessThanOrEqual
    hout_start
    CAST
      "2019-09-01 14:00:00"
      timestamp
{noformat}


But for streaming query, we want to change each ConstantTupleFilter and minus 
value for that timestamp. So the TupleFilter which be sent will be following:


{noformat}
AND
  GreatThanOrEqual
    hout_start
    CAST
      "2019-09-01 04:00:00"
      timestamp
  LessThanOrEqual
    hout_start
    CAST
      "2019-09-01 06:00:00"
      timestamp
{noformat}


Before query result processed by *OLAPEnumerator*,  kylin will plus each value 
of time derived column, thus protect row from be filtered by calcite generated 
code.

So, user will get what he want in his timezone without any burden.

h2. How to use
To enable auto shift by time zone, please set 
{color:#DE350B}kylin.stream.auto.just.by.timezone{color} to true.
You can specific time zone by {color:#DE350B}kylin.web.timezone{color}, 
otherwise, time zone will be auto detected.
Only *time derived column* will be affected.
h2. Related Issue
Originally, the event time can only in the format of a long value (UTC 
timestamp). But in some case, the event time is in a format of "yyyy-MM-dd 
HH:mm:ss", we use a new class DateTimeParser(introduced in KYLIN-4001) to 
convert such format into a UTC timestamp.


h3. Old Describe
In Real-Time Streaming Cube when I send some records to kafka topic, the 
tmestamp for the record is 2019-01-01 00:00:00.000, but kylin create a segment 
named 20181231160000_20181231170000.

Then I found that TimeZone is hard-coded to "GMT" in function makeSegmentName 
for class CubeSegment. I think that it should be config in kylin.properties.

  was:
h2. Backgroud
In realtime OLAP, we index real-time event in streaming receiver. We know that 
each event must contains a timestamp column (we often call it event time), that 
value should represent when this event was produced. Because event maybe come 
from different timezone and use local timezone is always error-prone, so we 
recommend to use a GMT+0 timestamp(System.currentTimeMillis()) to avoid such 
issue.

I think this is good by design, it is easy to understand and always correct. 
But the side effect is that, the end user(business manager behind a BI tools) 
are unhappy because he have to use GMT+0 with date/time related filter in SQL 
and should understand the result should be shifted with his local timezone. It 
is not user-firendly and inconvenient for normal user. Because user may compare 
query result from different data source and compare them and summarize, use 
GMT+0 may trouble them.

h2. Example
For example, kylin user work in GMT+8 (maybe in Shanghai) want to know some 
metrics which occured from {color:#DE350B}2019-09-01 12:00:00{color} to 
{color:#DE350B}2019-09-01 14:00:00{color} in his {color:#DE350B}local 
timezone{color}, so he has to {color:#DE350B}rewrite{color} his query (with 
eight hour offset) to following: 


{code:sql}
select hour_start, count(*)
from realtime_table
where hour_start >= "2019-09-01 04:00:00" and hour_start < "2019-09-01 
06:00:00"  
group by hour_start
{code}


And he will get result like :

||hour_start ||count||
|2019-09-01 04:00:00    |139202|
|2019-09-01 05:00:00    |89398|

And he must convert to a more meaningful result in his mind, it is realy 
annoying!

||hour_start ||count||
|2019-09-01 12:00:00    |139202|
|2019-09-01 13:00:00    |89398|



h2. Desgin
We should not change the way receiver index event, event time should be stored 
in UTC timestamp. We should auto rewrite sql's event time related filter. 

In kylin, filter condition in where clause will be convert to a *TupleFilter*, 
and it looks like *RelNode* in Apache Calicate.
For where hour_start >= "2019-09-01 12:00:00" and hour_start < "2019-09-01 
14:00:00", we will send TupleFilter to streaming receiver or region server 
which looks like this:


{noformat}
AND
  GreatThanOrEqual
    hout_start
    CAST
      "2019-09-01 12:00:00"
      timestamp
  LessThanOrEqual
    hout_start
    CAST
      "2019-09-01 14:00:00"
      timestamp
{noformat}


But for streaming query, we want to change each ConstantTupleFilter and minus 
value for that timestamp. So the TupleFilter which be sent will be following:


{noformat}
AND
  GreatThanOrEqual
    hout_start
    CAST
      "2019-09-01 04:00:00"
      timestamp
  LessThanOrEqual
    hout_start
    CAST
      "2019-09-01 06:00:00"
      timestamp
{noformat}


Before query result processed by *OLAPEnumerator*,  kylin will plus each value 
of time derived column, thus protect row from be filtered by calcite generated 
code.

So, user will get what he want in his timezone without any burden.

h2. How to use
To enable auto shift by time zone, please set kylin.stream.auto-timezone to 
true.
You can specific time zone by kylin.web.timezone, otherwise, time zone will be 
auto detected.
Only *time derived column* will be affected.
h2. Related Issue
Originally, the event time can only in the format of a long value (UTC 
timestamp). But in some case, the event time is in a format of "yyyy-MM-dd 
HH:mm:ss", we use a new class DateTimeParser(introduced in KYLIN-4001) to 
convert such format into a UTC timestamp.


In Real-Time Streaming Cube when I send some records to kafka topic, the 
tmestamp for the record is 2019-01-01 00:00:00.000, but kylin create a segment 
named 20181231160000_20181231170000.

Then I found that TimeZone is hard-coded to "GMT" in function makeSegmentName 
for class CubeSegment. I think that it should be config in kylin.properties.


> Auto adjust offset according to query server's timezone for time derived 
> column
> -------------------------------------------------------------------------------
>
>                 Key: KYLIN-4010
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4010
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Others
>    Affects Versions: v3.0.0-alpha
>            Reporter: zengrui
>            Assignee: Xiaoxiang Yu
>            Priority: Minor
>             Fix For: v3.0.0-beta
>
>         Attachments: image-2019-07-15-17-15-31-209.png, 
> image-2019-07-15-17-17-04-029.png, image-2019-07-15-17-17-39-568.png
>
>
> h2. Backgroud
> In realtime OLAP, we index real-time event in streaming receiver. We know 
> that each event must contains a timestamp column (we often call it event 
> time), that value should represent when this event was produced. Because 
> event maybe come from different timezone and use local timezone is always 
> *error-prone*, so we recommend to use a {color:#DE350B}GMT+0{color} 
> timestamp(System.currentTimeMillis()) to avoid such issue.
> I think this is good by design, it is easy to understand and always correct. 
> But the *side effect* is that, the end user(business manager behind a BI 
> tools) are unhappy because he have to use GMT+0 with date/time related filter 
> in SQL and should understand the result should be *shifted* with his local 
> timezone. It is not user-firendly and inconvenient for normal user. Because 
> user may compare query result from different data source and compare them and 
> summarize, use GMT+0 may trouble them.
> h2. Example
> For example, kylin user work in *GMT+8* (maybe in Shanghai) want to know some 
> metrics which occured from {color:#DE350B}2019-09-01 12:00:00{color} to 
> {color:#DE350B}2019-09-01 14:00:00{color} in his {color:#DE350B}local 
> timezone{color}, so he has to {color:#DE350B}rewrite{color} his query (with 
> eight hour offset) to following: 
> {code:sql}
> select hour_start, count(*)
> from realtime_table
> where hour_start >= "2019-09-01 04:00:00" and hour_start < "2019-09-01 
> 06:00:00"  
> group by hour_start
> {code}
> And he will get result like :
> ||hour_start ||count||
> |2019-09-01 04:00:00  |139202|
> |2019-09-01 05:00:00  |89398|
> And he must convert to a more meaningful result in his mind, it is realy 
> annoying!
> ||hour_start ||count||
> |2019-09-01 12:00:00  |139202|
> |2019-09-01 13:00:00  |89398|
> h2. Desgin
> We should not change the way receiver index event, event time should be 
> stored in UTC timestamp. We should auto rewrite sql's event time related 
> filter. 
> In kylin, filter condition in where clause will be convert to a 
> *TupleFilter*, and it looks like *RelNode* in Apache Calicate.
> For where hour_start >= "2019-09-01 12:00:00" and hour_start < "2019-09-01 
> 14:00:00", we will send TupleFilter to streaming receiver or region server 
> which looks like this:
> {noformat}
> AND
>   GreatThanOrEqual
>     hout_start
>     CAST
>       "2019-09-01 12:00:00"
>       timestamp
>   LessThanOrEqual
>     hout_start
>     CAST
>       "2019-09-01 14:00:00"
>       timestamp
> {noformat}
> But for streaming query, we want to change each ConstantTupleFilter and minus 
> value for that timestamp. So the TupleFilter which be sent will be following:
> {noformat}
> AND
>   GreatThanOrEqual
>     hout_start
>     CAST
>       "2019-09-01 04:00:00"
>       timestamp
>   LessThanOrEqual
>     hout_start
>     CAST
>       "2019-09-01 06:00:00"
>       timestamp
> {noformat}
> Before query result processed by *OLAPEnumerator*,  kylin will plus each 
> value of time derived column, thus protect row from be filtered by calcite 
> generated code.
> So, user will get what he want in his timezone without any burden.
> h2. How to use
> To enable auto shift by time zone, please set 
> {color:#DE350B}kylin.stream.auto.just.by.timezone{color} to true.
> You can specific time zone by {color:#DE350B}kylin.web.timezone{color}, 
> otherwise, time zone will be auto detected.
> Only *time derived column* will be affected.
> h2. Related Issue
> Originally, the event time can only in the format of a long value (UTC 
> timestamp). But in some case, the event time is in a format of "yyyy-MM-dd 
> HH:mm:ss", we use a new class DateTimeParser(introduced in KYLIN-4001) to 
> convert such format into a UTC timestamp.
> h3. Old Describe
> In Real-Time Streaming Cube when I send some records to kafka topic, the 
> tmestamp for the record is 2019-01-01 00:00:00.000, but kylin create a 
> segment named 20181231160000_20181231170000.
> Then I found that TimeZone is hard-coded to "GMT" in function makeSegmentName 
> for class CubeSegment. I think that it should be config in kylin.properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to