2) a practical/situational view of managing a cassandra cluster
...
it would be nice to have a more comprehensive deployment guide.
You're right.  Maybe we can get Digg to share theirs. :)
We don't have any such thing. The deployment at Digg is just as alpha as the deployment anywhere else. The database team is still trying to figure out how to tune, monitor/alert on, and deploy the cluster. So far it's chaotic.

We have no experience with what to do when a node fails, a rack fails, or a datacentre fails.

Our experience with data corruption has been answered with "lose that data, hope the bug was fixed, redeploy next version up."

Our answer to "Cassandra performance has degraded in an unusual fashion" has been to shut Cassandra down and work on an upgrade path.

If anything, I might advise an entity undertaking a Cassandra deployment to "have developers on staff that can help you administer the cluster by way of hacking the source code" because, honestly, that's how we've done it thus far.

I expect once Cassandra features, architecture, and bugginess stabilise (I understand we're on the cusp of that now), the database team at Digg will take nearly 100% responsibility for the cluster, and at that point we will write extensive documentation about administering the cluster. My estimate is 3-9 months from now.

I guess since this is the users survey thread, I should list what I wish I had. I would love to have a CLI that can tell me:

        1. What's the keyspace?
        2. What column families exist?
        3. What supercolumns exist?
        4. What columns are part of a particular supercolumn?
        5. What is the key range for a given column family?
        6. What are the last N rows in this column family?
        7. What are the first N rows?
        8. If I query a key range M..N, what nodes would likely answer?
        9. For a given structure I can see, what is the underlying
           directory, file, memory, structure? What SStables make up
           this column family? Which are compacted? What are their
           sizes? How many tombstones are in each? Etc.

I would want this all from the point of view of a CLI. I would not want to have to login to any particular node via a shell to ask these questions (so "Just look at the XML config file!" is not the proper answer).

Think of a "shell" client of Cassandra that allows exploration and navigation by way of Cassandra-specific ls, cd, ps, cat, head, tail.

--
timeless

Reply via email to