Hi Malith, The current (hive-based) solution (and it seems the proposed solution) only handles Column Families (CFs) created/maintained by BAM (based on the stream-def). Couple of improvements would really help: - Currently, the archiving configuration is per 'CF+stream-def-version'. Is it possible to have just one Archive configuration that takes care of a given CF irrespective of the stream-def-version. - Archiving feature to support 'any CF' exist in a given Cassandra Cluster. We are currently using Cassandra (instead of RDBMS like MySql) to store Analyzed Data. Of course, the configuration would need to have name of the 'timestamp' column for each CF, based on which the data would be filtered for archiving.
For Hector-based implementation, I would imagine that 'non-secondary' indexing on the 'timestamp column' would require to efficiently filter and archive the data. If you agree, how do you folks plan to handle this? If not required, how would the solution scale/perform-better without indexing? Also, in addition to archiving data from Cassandra (ActiveStore) to Cassandra (ArchiveStore), shouldn't it support archiving to traditional-SAN-like-storage-options, HDFS etc. I think, these other options could easily/naturally supported by Hive itself - where the hive-result could be streamed as key-value to these type of archive-stores. Regards, Dipesh Malith Dhanushka wrote > Hi folks, > > We(BAM team, Sumedha) had a discussion about the $Subject and following > are the suggested improvements for the Cassandra data archival feature in > BAM. > > - Remove hive script based archiving and use hector API to directly issue > archive queries to Cassandra (Current implementation is based > on hive where it generates hive script and archiving process uses > map-reduce jobs to achieve the task and it has a limitation of discarding > custom key value pares in column family) > > - Use Task component for scheduling purposes > > - Archive data to external Cassandra ring > > - Major UI improvements > - List the current archiving tasks > - Edit, Remove and Schedule archiving tasks > - Add new archiving task > > If there is any additional requirements please raise. > > Thanks, > Malith > -- > Malith Dhanushka > > Engineer - Data Technologies > *WSO2, Inc. : wso2.com* > > *Mobile* : +94 716 506 693 > > _______________________________________________ > Architecture mailing list > Architecture@ > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- View this message in context: http://wso2-oxygen-tank.10903.n7.nabble.com/BAM-Data-Archival-Feature-improvements-tp85315p85330.html Sent from the WSO2 Architecture mailing list archive at Nabble.com. _______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
