Do we walk though all the data to find the old data?

If yes, we can do better using indexes we plan to build for incremental
processing. Pls talk to BuddikaChamith!


On Wed, Jan 16, 2013 at 11:43 PM, Kasun Weranga <[email protected]> wrote:

> Hi,
>
> I am going to do the $subject. Plan is to use map-reduce jobs to do the
> purging and archival process, so it can handle the purging/archiving large
> amount of data. This feature has ability to purge/archive data manually
> by specifying a duration and also automatically purge/archive older data
> (only keeping data of last N number of days). User can select the stream
> name, version and the duration of the data that he needs to archive.
>
> Here I have described the model that we came up to achieve the above
> functionality.
>
> 1. Select the data specified in the time duration and insert those data
> into a column family. (use a Hive query)
> 2. Insert rowkeys of selected data into a temporary CF (use a Hive query)
> -  If we have large amount of data to archive, we can't keep all the
> rowkeys in memory, so we insert them into a temp CF.
> 3. Write a class analyzer to read the rowkeys from temporary CF and then
> delete data in the original CF (use custom map-reduce jobs), finally we
> will delete the temporary CF.
>
>
> Class analyzer can be included in a hive script, because of that
> purging/archiving can be done using single Hive script that can be
> generated programmatically. As a advantage of that we can reuse the
> scheduling functionality implemented for Hive scripts as well.
>
> This is the model we came up for archiving/purging Cassandra data in BAM,
> if you have any concerns Please raise them.
>
> Thanks,
> KasunW.
>
>
>
>
>
>
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
============================
Srinath Perera, Ph.D.
   http://www.cs.indiana.edu/~hperera/
   http://srinathsview.blogspot.com/
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to