Puneet Gupta created LENS-1309:
----------------------------------
Summary: Add capability to specify that "Future Partitions" should
not be considered while answering qeuries
Key: LENS-1309
URL: https://issues.apache.org/jira/browse/LENS-1309
Project: Apache Lens
Issue Type: Improvement
Reporter: Puneet Gupta
Use case .
Lets say we have a Fact A which has DAILY and HOURLY update periods.
We have partitioned the fact based on pt(process time) and et(event arrival
time).
Assume today is Sep 9th and while processing data for Sep 8th 23rd(last) hour
(i.e , pt=2016-09-08-23), we found few records with Event time as Sep 9, 0th
hour (due to .. clock synchronization, fraud data,etc). This will lead to
partitions like pt=2016-09-08-23 an et =2016-09-09-00 at HOUR level and
pt=2016-09-08 and et =2016-09-09 at DAY level.
This makes the system believe that 9th DAY level data is available for event
time queries (as the time line does not consider pt for event time queries).
This will lead to wrong query outputs since this day partition pt=2016-09-08
and et =2016-09-09 will have only a very small part of 9th day data. Major
chunk of DAY data for 9th will only get created on 10th morning (pt=2016-09-09
and et =2016-09-09). In this case LENS will answer query from DAY update period
for 9th Sep, while it should have used HOURLY data for 9th.
Expose a query level config to enforce/specify semantics that make sure LENS
considers et partitions only if they are <= most recent pt partition. The
future partitions should be ignored for higher granularity(DAY) and instead
query should get answered form lower granularity data(HOUR). This should also
apply for lookahead.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)