[Architecture] Cassandra row deletion issue

Malith Dhanushka Wed, 23 Oct 2013 03:23:12 -0700

Hi folks,

We have a functionality in BAM data archival feature where it deletes the
respective rows from Cassandra original CF while archiving. But in
Cassandra the delete operation doesn't just wipe out all traces of the data
being removed immediately. Instead of wiping out data on delete, Cassandra
replaces it with a special value called a tombstone. So basically it keeps
the raw id with null values in columns. So after archiving, if someone runs
a hive script on that CF ,it triggers exceptions for raw id with null
values when writing to RDBMS.


But it seems this is rather a feature in Cassandra to make eventual
consistency of data in replicas. The data can't actually be removed if we
perform a delete, instead, a marker (tombstone) is written to indicate the
value's new status. On the first compaction that occurs between the data
and the tombstone, the data will be removed completely and the
corresponding disk space recovered. There is a property called
GCGraceSeconds which can be defined per CF basis to specify the time to
wait before garbage collecting tombstones (default value is 10 days). In
many deployments this interval can be reduced, and in a single-node cluster
it can be safely set to zero.

So by considering above facts there are couple of alternatives we can think
of,

1. We can fine-tune and reduce the value of GCGraceSeconds and tell the
users to run the hive scripts after that time once run the archival
process. But both hive scripts and archiving have their own scheduling, so
later on syncing might get messy.

2. Programmatically check the Casandra null values (considering mandatory
column like timestamp) and skip those when writing to RDBMS. But this is
bit tricky when it comes to Cassandra wide-row operation.

Any ideas on this,

Thanks,
Malith
-- 
Malith Dhanushka
Engineer - Data Technologies
*WSO2, Inc. : wso2.com*
*Mobile*          : +94 716 506 693

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

[Architecture] Cassandra row deletion issue

Reply via email to