Andrzej Bialecki wrote:
What I need is a way to _temporarily_ make them localized to a particular machine, just for performance reasons, and without having to copy them out of DFS ...
Your current solution is to copy them to the local FS. This is unacceptable because it is (a) slow and (b) uses too much space.
Re (a): I don't see how the solution you suggest could be any faster. If anything it could be slower, since, in addition to copying a file's blocks locally, one would also need to make room for these blocks by re-locating blocks that were local to other nodes. Ideally these would be other index file blocks, but it might not work out that well.
Re (b): Since you'll be copying indexes out to local storage, could you reduce their replication count from three to two? That would free up some space on each node (about the right amount, in fact). If the DFS copy becomes incomplete, then you'd have to manually either re-create the index or copy a local version, which is not quite as convenient as having DFS handle your disk failures, but, with a replication of two this should still be rare.
Disks are awfully big these days. I'm surprised your disks are so full that an index that is small-enough to be searched quickly by a single node takes up a significant amount of the disk. Or are you copying the entire segment locally? I would hope that DFS would be fast enough for summary and cache requests. With a cluster of 10 nodes and ten hits displayed per page, each node should only need to handle one summary request per query. Cache requests are much rarer yet.
Doug
