Bijeet, your understanding is correct, thanks for the comment; We had planned to release this feature in 0.8, for the streaming case; Now we see the community has the need, will back port to 0.7 and release in 0.7.3 or 0.7.4;
Here are retention related JIRAs, I will associate them together: https://issues.apache.org/jira/browse/KYLIN-886 https://issues.apache.org/jira/browse/KYLIN-895 https://issues.apache.org/jira/browse/KYLIN-906 On 7/27/15, 2:47 AM, "Bijeet Singh" <[email protected]> wrote: >From what I understand, a cube comprises multiple segments and each >segment >is effectively a table in HBase. While querying, HBaseKeyRange is created >for each matching segment of the cube and the result from the segments is >finally merged. So it seems that truncating the HBase table, corresponding >to an older segment will not affect the other segments. Please correct me >if I am wrong here. > >If it is indeed possible to truncate the older segments, while maintaining >the correctness of cube, then the older data can effectively be deleted >from the cube by truncating the corresponding HBase tables. > >This way, if I want to retain data for say, around 60 days, I can have 10 >segments(given that 10 seems to be the optimal number of segments) each >having 6 days worth of data. And once I have the 11th segment ready for >the >most recent 6 days, I can truncate the oldest segment. > >Please let me know if it looks feasible. > >Thanks, >Bijeet > >On Sat, Jul 25, 2015 at 6:41 AM, vipul jhawar <[email protected]> >wrote: > >> Sure, i will open a JIRA. >> >> So, at eBay you are storing the data forever in the cubes ? >> >> Rebuilding the cube several days seems to be very suboptimal as it >>means we >> have to spend lot more resources again. >> Even if i partitioned my cubes by days such as cube_01, cube_02 by >>month i >> would have to go and run parallel queries against all of them when my >>date >> range is across months and then re aggregate in memory. >> >> On Fri, Jul 24, 2015 at 8:39 PM, Han, Luke <[email protected]> wrote: >> >> > Could you please open one JIRA for this? We have one for streaming >>case, >> > but I think it make sense to enable retention for batch also. >> > >> > Currently, I would like to say you have to rebuild cube several days >>to >> > discard old data. >> > To minimum impact, you can define two cubes with same logical, and >>build >> > one first, then build another one like 7days later, once new one done, >> > disable old one and purge the data, then, again and again.... >> > >> > Thanks. >> > >> > 发自我的 iPhone >> > >> > > 在 2015年7月24日,22:22,vipul jhawar <[email protected]> 写道: >> > > >> > > Hi >> > > >> > > Would be interested to know, what solutions you would recommend to >> > > implement data retention. Say if we want to retain data for only >>upto >> > last >> > > 90 days in the cube, what is the best option. >> > > >> > > Our daily size is > 60 G so we cannot store data forever and want >>limit >> > to >> > > a time range to support advanced analysis. >> > > >> > > Thanks >> > >>
