[ 
https://issues.apache.org/jira/browse/KYLIN-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Li closed KYLIN-1513.
--------------------------

Resolved in release 1.5.1 (2016-04-13)

> Time partitioning doesn't work across multiple days
> ---------------------------------------------------
>
>                 Key: KYLIN-1513
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1513
>             Project: Kylin
>          Issue Type: Bug
>    Affects Versions: v1.5.0
>            Reporter: Nick Muerdter
>            Assignee: Shaofeng SHI
>             Fix For: v1.5.1
>
>
> I was attempting to use the new time partition functionality added in v1.5.0 
> (https://issues.apache.org/jira/browse/KYLIN-1427). I realize this hasn't 
> been added to the interface yet 
> (https://issues.apache.org/jira/browse/KYLIN-1441), so I'm not sure if this 
> is feature is completely ready yet, but I thought I'd note the issue I ran 
> into:
> - Using the API, I defined the `partition_time_column` and 
> `partition_time_format` attributes on my model:
>     {code}
> {
>   "partition_desc": {
>     "partition_date_column": "DEFAULT.LOGS.CAL_DATE",
>     "partition_date_format": "yyyy-MM-dd",
>     "partition_date_column": "DEFAULT.LOGS.CAL_HOUR",
>     "partition_date_format": "H",
>     "partition_date_start": null,
>     "partition_type": "APPEND"
>   }
> }
>     {code}
> - I then attempted to build the cube for the first time over a multi-day 
> duration (2010-08-16 to 2010-11-01).
> - The cube reported successfully building, but the resulting cube contained 
> no data.
> When I looked at the Kylin logs, it looks like the SQL generated by the time 
> partition won't work if the query spans multiple days. In my case, the WHERE 
> clause used for 2010-08-16 to 2010-11-01 was:
> {code}
> WHERE (LOGS.CAL_DATE >= '2010-08-16' AND LOGS.CAL_DATE < '2010-11-01' AND 
> LOGS.CAL_HOUR >= '0' AND LOGS.CAL_HOUR < '0')
> {code}
> The issues seems to be that the hour condition is ANDed to the end of the 
> existing date range query: 
> https://github.com/apache/kylin/blob/kylin-1.5.0/core-metadata/src/main/java/org/apache/kylin/metadata/model/PartitionDesc.java#L170-L174
>  However, this logic doesn't work when the date range spans multiple days. In 
> this case, it's trying to match where the hour column is both >= 0 and < 0 
> (since I was processing midnight to midnight), which will never match any 
> results. However, if I switched my cube build to end at 02:00, I believe this 
> would lead to some results being built in the cube, but not what's expected 
> (it would only process 00:00-02:00 on each day.
> So while I think the current implementation will work as long as you're 
> building less than 24 hours of data at a time, it would be nice if this could 
> still support multiple-day builds when this time partition is also present.
> I think when a separate time column is present, the SQL generated would need 
> to be something more like:
> {code}
> WHERE
>   (LOGS.CAL_DATE = '2010-08-16' AND LOGS.CAL_HOUR >= '0') OR
>   (LOGS.CAL_DATE > '2010-08-16' AND LOGS.CAL_DATE < '2010-11-01') OR
>   (LOGS.CAL_DATE = '2010-11-01' AND LOGS.CAL_HOUR < '0')
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to