Hi eveybody,

we have some problems running repairs on a timely schedule. We have a three node deployment, and we start repair on one node every week, repairing one columnfamily by one. However, when we run into the big column families, usually repair sessions hangs undefinitely, and we have to restart them manually.

The script runs commands like:

nodetool repair keyspace columnfamily

one by one.

This has not been a major issue for some time, since we never delete data, however we would like to sort the issue once and for all.

Reading resources on the net, I came to the conclusion that we could:

1) either run a repair sessione like the one above, but with the -pr switch, and run it on every node, not just on one 2) or run sub range repair as described here http://www.datastax.com/dev/blog/advanced-repair-techniques , which would be the best option. However the latter procedure would require us to write some java program that calls describe_splits to get the tokens to feed nodetool repair with.

The second procedure is available out of the box only in the commercial version of the opscenter, is this true?

I would like to know if these are the current best practices for repairs or if there is some other option that makes repair easier to perform, and more
reliable that it is now.

Regards,

Paolo Crosato

--
Paolo Crosato
Software engineer/Custom Solutions
e-mail: paolo.cros...@targaubiest.com

Reply via email to