Appropriateness of Coda

Eric McCoy Sat, 27 Nov 2004 06:59:09 -0800

Hello all. I'm looking for some advice on whether Coda would be appropriate for my situation. I have read the FAQ and docs, but it seems that most people are using Coda for a different application and all the docs have that in mind. It may be nobody uses Coda the way I'm thinking because it's a stupid idea, so that's why I'm asking!

At present we have a small server farm which processes HTML logs for a large number of websites (in the tens of thousands). The way this works is we have seven processing servers which do nothing but parse logs and generate reports. The reports, once created, are saved forever to a storage server (via NFS) which is directly attached to a big (~1.9TB) array.

There are two problems here. First, if the storage server goes offline for whatever reason (like if NFS decides to flake out for a few seconds), all the processing servers hang and have to be power cycled. This is Very Bad. And second, the processing servers all need to get the actual logs and store them on a local filesystem (because NFS is way too slow). These logs can get very very big so if a bunch of requests arrive all at once - like when the monthly reports are automatically generated - they tend to run out of disk space and die.

I am thinking that Coda might be able to provide a solution to these problems - a better one than the one we are using now, anyway, which is "throw more hideously expensive servers at the problem and hope it goes away."

My rough-sketch thought process is that, naturally, the storage server would still provide all the disk space. Each Coda client (the processing servers) would make a cache about the size of the partition it's using now for its temporary files (anywhere from 50-140GB).

The first problem would be solved, or at least mitigated, because the coda clients could still do their thing if the storage server crapped out for a while. Most of the time the processing servers run at 40% disk capacity so that should leave enough space for at least an hour or two of disconnected operation. If the array fails or something, we're pretty screwed anyway, but at least we could process requests for the time it takes to reinitialize the thing.

The second problem would be, again, solved or at least mitigated because the coda clients would have "emergency backup" storage. The temporary files would be written to /coda and go in the cache. Since those files only have lifetimes of a few minutes and are never requested by other clients in the farm, they should never need to go over the network unless the cache fills up. If the cache does fill up, which it will at least once a month, performance will degrade (significantly) but at least the requests will still be processed. Once the backlog gets handled (takes about a day) the caches will clear out and everything will go back to normal. If the storage array fills up to the point where we're running out of disk space again, we can add another (much smaller) one just for this temporary storage. This way we could avoid dropping huge sums on a quad-CPU box, which is what we're doing now, just to add storage capacity.

Now after that lengthy story and rationale, my first question is obvious: Is my reasoning correct? Is this something Coda can do, even though most people aren't using it quite like I want to?

Second question should also be pretty obvious and it's a FAQ: Is Coda reliable enough to be used for this? I know the FAQ says "no" but our current solution is already terribly unreliable (we lose a ton of reports every month due to the disk space problems alone; I have no hard figures but I'd estimate up to an eighth of the monthly requests are lost, forcing our customers to request their reports manually a day or week later). As long as Coda can more-or-less guarantee that the archived reports on the storage array won't get trashed... If the major reliability concern is having to restart coda processes occasionally, we can do that.

Third question is related to the first two: If Coda is not, for whatever reason, appropriate for this, is there something similar which is? I've looked at Lustre, and probably will continue looking at it, but it seems geared towards much larger clusters than ours. It may also be much less of a "drop-in" replacement than Coda, which is by now a standard part of the free Unix-like OSes.

Thanks in advance for any tips or pointers, and to anyone who actually read this whole thing, I admire your stamina!

Appropriateness of Coda

Reply via email to