2010/10/15 <cinap_len...@gmx.de>: > i wonder if making 9p work better over high latency connections is > even the right answer to the problem.
The reason I care is that the link from a CPU node to a file server on blue gene is high latency. It might as well be cross-country, it's so bad. > would it not be cool to have a way to > teleport/migrate your process to a cpu server close to the data? If only ... but Blue Gene is very heavy! It's too hard to carry around like that. > i know, this is a crazy blue sky idea that has lots of problems on its > own... but it poped up again when i read the "bring the computation > to the data" point from the ospray talk. it's a very attractive idea that has been tried extensively at different times and places for several decades and in almost all cases has never worked out. Here's a streaming idea that worked out well. In 1994, Larry McVoy while at SGI did something called "NFS bypass". Basically the code did the standard NFS lookup/getattr cycle but when it came time to do the read, what was returned was a TCP socket that in effect implemented streamed data from server to client. Flow control became a TCP problem, and SGI TCP did a good job with it. At that point the (fast for the time) 800 mbit/second HIPPI channel ran at full rate, as opposed to the much slower NFS file-oriented read. Simple stream-oriented communications replaced all the games people play to do prefetch, which in many cases is just making a packet-oriented interface look like a stream anyway. Look at it this way. Start rio under ratrace. Watch all the file IO. You'd be hard pressed to find the ones that don't look like streams. I also don't buy the "oh but this fails for a 2 GB file" argument. Find me the 2 GB file you're using and then let's talk about not streaming files that are too big. But if you are using a 2 GB *output* file then you're almost always better off streaming that file out -- also verified on the LLNL Blue Gene where the current checkpointing of 32 TB of memory looks like a minor explosion, because the file system protocol is such a bad fit to what is needed -- a stream from memory to the server. Streaming has been shown to work for HPC systems because most of our applications stream data in our out. The transition from a stream to the packet-oriented file IO protocol has never been comfortable. That doesn't mean one only does streams, just that one should not make them impossible. ron