Cleanup in a write-only environment
In my understanding Cleanup is meant to help clear out data that has been removed. If you have an environment where data is only ever added (the case for the production system I'm working with), is there a point to automating cleanup? I understand that if we were to ever purge a segment of data from our cluster we'd certainly want to run it, or after added a new node and adjusting the tokens. So I want to make sure I'm not missing something here and that there would be other reasons to run cleanup regularly? -- *David McNelis* Lead Software Engineer Agentis Energy www.agentisenergy.com c: 219.384.5143 *A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.*
Re: Cleanup in a write-only environment
I believe you are mis-understanding what cleanup does. Cleanup is used to remove data from a node that the node no longer owns. For example when you move a node in the ring, it changes responsibility and gets new data, but does not automatically delete the data it used to be responsible for but no longer is. In this situation, you run cleanup to delete all of that old data. Data that has been deleted/expired will get removed automatically as compaction runs. On Wed, Nov 30, 2011 at 7:24 AM, David McNelis dmcne...@agentisenergy.com wrote: In my understanding Cleanup is meant to help clear out data that has been removed. If you have an environment where data is only ever added (the case for the production system I'm working with), is there a point to automating cleanup? I understand that if we were to ever purge a segment of data from our cluster we'd certainly want to run it, or after added a new node and adjusting the tokens. So I want to make sure I'm not missing something here and that there would be other reasons to run cleanup regularly? -- David McNelis Lead Software Engineer Agentis Energy www.agentisenergy.com c: 219.384.5143 A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.
Re: Cleanup in a write-only environment
Your understanding of nodetool cleanup is not correct. cleanup is used only after cluster balancing like adding or removing nodes. It removes data that does not belong on the node anymore (in older versions it removed hints as well) Your debate is needing to run companion . In a write only workload you should let cassandra do its normal connection.(in most cases) On Wednesday, November 30, 2011, David McNelis dmcne...@agentisenergy.com wrote: In my understanding Cleanup is meant to help clear out data that has been removed. If you have an environment where data is only ever added (the case for the production system I'm working with), is there a point to automating cleanup? I understand that if we were to ever purge a segment of data from our cluster we'd certainly want to run it, or after added a new node and adjusting the tokens. So I want to make sure I'm not missing something here and that there would be other reasons to run cleanup regularly? -- David McNelis Lead Software Engineer Agentis Energy www.agentisenergy.com c: 219.384.5143 A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.
Re: Cleanup in a write-only environment
Thanks, folks. I think I must have read compaction, thought cleanup, and gotten muddled from there. David On Nov 30, 2011 6:45 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Your understanding of nodetool cleanup is not correct. cleanup is used only after cluster balancing like adding or removing nodes. It removes data that does not belong on the node anymore (in older versions it removed hints as well) Your debate is needing to run companion . In a write only workload you should let cassandra do its normal connection.(in most cases) On Wednesday, November 30, 2011, David McNelis dmcne...@agentisenergy.com wrote: In my understanding Cleanup is meant to help clear out data that has been removed. If you have an environment where data is only ever added (the case for the production system I'm working with), is there a point to automating cleanup? I understand that if we were to ever purge a segment of data from our cluster we'd certainly want to run it, or after added a new node and adjusting the tokens. So I want to make sure I'm not missing something here and that there would be other reasons to run cleanup regularly? -- David McNelis Lead Software Engineer Agentis Energy www.agentisenergy.com c: 219.384.5143 A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.