>From what I understand, a cube comprises multiple segments and each segment
is effectively a table in HBase. While querying, HBaseKeyRange is created
for each matching segment of the cube and the result from the segments is
finally merged. So it seems that truncating the HBase table, corresponding
to an older segment will not affect the other segments. Please correct me
if I am wrong here.

If it is indeed possible to truncate the older segments, while maintaining
the correctness of cube, then the older data can effectively be deleted
from the cube by truncating the corresponding HBase tables.

This way,  if I want to retain data for say, around 60 days, I can have 10
segments(given that 10 seems to be the optimal number of segments) each
having 6 days worth of data. And once I have the 11th segment ready for the
most recent 6 days, I can truncate the oldest segment.

Please let me know if it looks feasible.

Thanks,
Bijeet

On Sat, Jul 25, 2015 at 6:41 AM, vipul jhawar <[email protected]>
wrote:

> Sure, i will open a JIRA.
>
> So, at eBay you are storing the data forever in the cubes ?
>
> Rebuilding the cube several days seems to be very suboptimal as it means we
> have to spend lot more resources again.
> Even if i partitioned my cubes by days such as cube_01, cube_02 by month i
> would have to go and run parallel queries against all of them when my date
> range is across months and then re aggregate in memory.
>
> On Fri, Jul 24, 2015 at 8:39 PM, Han, Luke <[email protected]> wrote:
>
> > Could you please open one JIRA for this? We have one for streaming case,
> > but I think it make sense to enable retention for batch also.
> >
> > Currently, I would like to say you have to rebuild cube several days to
> > discard old data.
> > To minimum impact, you can define two cubes with same logical, and build
> > one first, then build another one like 7days later, once new one done,
> > disable old one and purge the data, then, again and again....
> >
> > Thanks.
> >
> > 发自我的 iPhone
> >
> > > 在 2015年7月24日,22:22,vipul jhawar <[email protected]> 写道:
> > >
> > > Hi
> > >
> > > Would be interested to know, what solutions you would recommend to
> > > implement data retention. Say if we want to retain data for only upto
> > last
> > > 90 days in the cube, what is the best option.
> > >
> > > Our daily size is > 60 G so we cannot store data forever and want limit
> > to
> > > a time range to support advanced analysis.
> > >
> > > Thanks
> >
>

Reply via email to