Cleanup in a write-only environment

2011-11-30 Thread David McNelis
In my understanding Cleanup is meant to help clear out data that has  been
removed.  If you have an environment where data is only ever added (the
case for the production system I'm working with), is there a point to
automating cleanup?   I understand that if we were to ever purge a segment
of data from our cluster we'd certainly want to run it, or after added a
new node and adjusting the tokens.

So I want to make sure I'm not missing something here and that there  would
be other  reasons to run cleanup regularly?

-- 
*David McNelis*
Lead Software Engineer
Agentis Energy
www.agentisenergy.com
c: 219.384.5143

*A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.*


Re: Cleanup in a write-only environment

2011-11-30 Thread Nick Bailey
I believe you are mis-understanding what cleanup does. Cleanup is used
to remove data from a node that the node no longer owns. For example
when you move a node in the ring, it changes responsibility and gets
new data, but does not automatically delete the data it used to be
responsible for but no longer is. In this situation, you run cleanup
to delete all of that old data.

Data that has been deleted/expired will get removed automatically as
compaction runs.

On Wed, Nov 30, 2011 at 7:24 AM, David McNelis
dmcne...@agentisenergy.com wrote:
 In my understanding Cleanup is meant to help clear out data that has  been
 removed.  If you have an environment where data is only ever added (the case
 for the production system I'm working with), is there a point to automating
 cleanup?   I understand that if we were to ever purge a segment of data from
 our cluster we'd certainly want to run it, or after added a new node and
 adjusting the tokens.

 So I want to make sure I'm not missing something here and that there  would
 be other  reasons to run cleanup regularly?

 --
 David McNelis
 Lead Software Engineer
 Agentis Energy
 www.agentisenergy.com
 c: 219.384.5143

 A Smart Grid technology company focused on helping consumers of energy
 control an often under-managed resource.




Re: Cleanup in a write-only environment

2011-11-30 Thread Edward Capriolo
Your understanding of nodetool cleanup is not correct. cleanup is used only
after cluster balancing like adding or removing nodes. It removes data that
does not belong on the node anymore (in older versions it removed hints as
well)

Your debate is needing to run companion . In a write only workload you
should let cassandra do its normal connection.(in most cases)

On Wednesday, November 30, 2011, David McNelis dmcne...@agentisenergy.com
wrote:
 In my understanding Cleanup is meant to help clear out data that has
 been removed.  If you have an environment where data is only ever added
(the case for the production system I'm working with), is there a point to
automating cleanup?   I understand that if we were to ever purge a segment
of data from our cluster we'd certainly want to run it, or after added a
new node and adjusting the tokens.
 So I want to make sure I'm not missing something here and that there
 would be other  reasons to run cleanup regularly?

 --
 David McNelis
 Lead Software Engineer
 Agentis Energy
 www.agentisenergy.com
 c: 219.384.5143
 A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.




Re: Cleanup in a write-only environment

2011-11-30 Thread David McNelis
Thanks, folks.

I think I must have read compaction, thought cleanup, and gotten muddled
from there.

David
On Nov 30, 2011 6:45 PM, Edward Capriolo edlinuxg...@gmail.com wrote:

 Your understanding of nodetool cleanup is not correct. cleanup is used
 only after cluster balancing like adding or removing nodes. It removes data
 that does not belong on the node anymore (in older versions it removed
 hints as well)

 Your debate is needing to run companion . In a write only workload you
 should let cassandra do its normal connection.(in most cases)

 On Wednesday, November 30, 2011, David McNelis dmcne...@agentisenergy.com
 wrote:
  In my understanding Cleanup is meant to help clear out data that has
  been removed.  If you have an environment where data is only ever added
 (the case for the production system I'm working with), is there a point to
 automating cleanup?   I understand that if we were to ever purge a segment
 of data from our cluster we'd certainly want to run it, or after added a
 new node and adjusting the tokens.
  So I want to make sure I'm not missing something here and that there
  would be other  reasons to run cleanup regularly?
 
  --
  David McNelis
  Lead Software Engineer
  Agentis Energy
  www.agentisenergy.com
  c: 219.384.5143
  A Smart Grid technology company focused on helping consumers of energy
 control an often under-managed resource.