Hi Malith,

The current (hive-based) solution (and it seems the proposed solution) only
handles Column Families (CFs) created/maintained by BAM (based on the
stream-def). Couple of improvements would really help:
 - Currently, the archiving configuration is per 'CF+stream-def-version'. Is
it possible to have just one Archive configuration that takes care of a
given CF irrespective of the stream-def-version.
 - Archiving feature to support 'any CF' exist in a given Cassandra Cluster.
We are currently using Cassandra (instead of RDBMS like MySql) to store
Analyzed Data. Of course, the configuration would need to have name of the
'timestamp' column for each CF, based on which the data would be filtered
for archiving.

For Hector-based implementation, I would imagine that 'non-secondary'
indexing on the 'timestamp column' would require to efficiently filter and
archive the data. If you agree, how do you folks plan to handle this? If not
required, how would the solution scale/perform-better without indexing?

Also, in addition to archiving data from Cassandra (ActiveStore) to
Cassandra (ArchiveStore), shouldn't it support archiving to
traditional-SAN-like-storage-options, HDFS etc. 
I think, these other options could easily/naturally supported by Hive itself
- where the hive-result could be streamed as key-value to these type of
archive-stores.

Regards,
Dipesh


Malith Dhanushka wrote
> Hi folks,
> 
> We(BAM team, Sumedha) had a  discussion about the $Subject and following
> are the suggested improvements for the Cassandra data archival feature in
> BAM.
> 
> - Remove hive script based archiving and use hector API to directly issue
> archive queries to             Cassandra  (Current implementation is based
> on hive where it generates hive script and archiving process uses
> map-reduce jobs to achieve the task and it has a limitation of discarding
> custom key value pares in column family)
> 
> - Use Task component for scheduling purposes
> 
> - Archive data to external Cassandra ring
> 
> - Major UI improvements
>     - List the current archiving tasks
>     - Edit, Remove and Schedule archiving tasks
>     - Add new archiving task
> 
> If there is any additional requirements please raise.
> 
> Thanks,
> Malith
> -- 
> Malith Dhanushka
> 
> Engineer - Data Technologies
> *WSO2, Inc. : wso2.com*
> 
> *Mobile*          : +94 716 506 693
> 
> _______________________________________________
> Architecture mailing list

> Architecture@

> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture





--
View this message in context: 
http://wso2-oxygen-tank.10903.n7.nabble.com/BAM-Data-Archival-Feature-improvements-tp85315p85330.html
Sent from the WSO2 Architecture mailing list archive at Nabble.com.
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to