[ https://issues.apache.org/jira/browse/KYLIN-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dong Li closed KYLIN-1513. -------------------------- Resolved in release 1.5.1 (2016-04-13) > Time partitioning doesn't work across multiple days > --------------------------------------------------- > > Key: KYLIN-1513 > URL: https://issues.apache.org/jira/browse/KYLIN-1513 > Project: Kylin > Issue Type: Bug > Affects Versions: v1.5.0 > Reporter: Nick Muerdter > Assignee: Shaofeng SHI > Fix For: v1.5.1 > > > I was attempting to use the new time partition functionality added in v1.5.0 > (https://issues.apache.org/jira/browse/KYLIN-1427). I realize this hasn't > been added to the interface yet > (https://issues.apache.org/jira/browse/KYLIN-1441), so I'm not sure if this > is feature is completely ready yet, but I thought I'd note the issue I ran > into: > - Using the API, I defined the `partition_time_column` and > `partition_time_format` attributes on my model: > {code} > { > "partition_desc": { > "partition_date_column": "DEFAULT.LOGS.CAL_DATE", > "partition_date_format": "yyyy-MM-dd", > "partition_date_column": "DEFAULT.LOGS.CAL_HOUR", > "partition_date_format": "H", > "partition_date_start": null, > "partition_type": "APPEND" > } > } > {code} > - I then attempted to build the cube for the first time over a multi-day > duration (2010-08-16 to 2010-11-01). > - The cube reported successfully building, but the resulting cube contained > no data. > When I looked at the Kylin logs, it looks like the SQL generated by the time > partition won't work if the query spans multiple days. In my case, the WHERE > clause used for 2010-08-16 to 2010-11-01 was: > {code} > WHERE (LOGS.CAL_DATE >= '2010-08-16' AND LOGS.CAL_DATE < '2010-11-01' AND > LOGS.CAL_HOUR >= '0' AND LOGS.CAL_HOUR < '0') > {code} > The issues seems to be that the hour condition is ANDed to the end of the > existing date range query: > https://github.com/apache/kylin/blob/kylin-1.5.0/core-metadata/src/main/java/org/apache/kylin/metadata/model/PartitionDesc.java#L170-L174 > However, this logic doesn't work when the date range spans multiple days. In > this case, it's trying to match where the hour column is both >= 0 and < 0 > (since I was processing midnight to midnight), which will never match any > results. However, if I switched my cube build to end at 02:00, I believe this > would lead to some results being built in the cube, but not what's expected > (it would only process 00:00-02:00 on each day. > So while I think the current implementation will work as long as you're > building less than 24 hours of data at a time, it would be nice if this could > still support multiple-day builds when this time partition is also present. > I think when a separate time column is present, the SQL generated would need > to be something more like: > {code} > WHERE > (LOGS.CAL_DATE = '2010-08-16' AND LOGS.CAL_HOUR >= '0') OR > (LOGS.CAL_DATE > '2010-08-16' AND LOGS.CAL_DATE < '2010-11-01') OR > (LOGS.CAL_DATE = '2010-11-01' AND LOGS.CAL_HOUR < '0') > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)