On 4/14/07, Lluís Batlle <[EMAIL PROTECTED]> wrote:


Now I still cannot image what kind of files/directories 'ts' should
serve to offer a good interface to the user, and how 'enqueuing'
should work, getting a nice 'RPC' through that filesystem.
I'll think of it, sure.


Liuis, I did something like this for the 9grid, for taskbags. I've
also written similar tools for Unix over the years, so I think I see
what you are up to. It's not identical to ts, just similar in nature.

The server I wrote for Plan 9 was originally a port of the TNT work
described here:
http://arxiv.org/PS_cache/astro-ph/pdf/9912/9912134v1.pdf. The TNT
server maintained task state describing work to be done, and
mutiplexed that work to clients. One of the nice things about TNT, not
mentioned in the paper, is that it supports "third party" or middlemen
clients, i.e. clients can become servers and grab a bunch of tasks to
be handed out. This distribution can be important if you have enough
nodes, and turn tasks over very frequently. Also, of course, clients
can put new work back into the bag of tasks. The work to be done was
communicated to the server to/from clients via RPC (SunRPC). So, I
began to do a straight conversion of this Unix socket-based server to
Plan 9; about 10 minutes into the job, I realized I was not thinking
about the problem correctly.

I thought about what I would do if starting from scratch, on Plan 9.
Plan 9 makes writing servers so easy that I went that route. I dumped
the RPC-based model and created a server that represented the state of
the work as directories and files. Basically, the server has three
directories, questions, working, and answers. You put questions into
the questions directory by opening and writing a file. To pass your
'namespace', you could do an ns command into the questions file you
created, then put the work in as a set of commands or rc script into
that file. Each new piece of work is a new file.
Clients start working by opening a questions file. Once a client has
opened a questions file, subsequent 'ls' commands will show that file
in the working directory, not the questions directory.  To see what
work is being done, you ls the working directory. Once a process is
done, it closes the file; at that point, the file will appear in the
answers directory, IF it has been written to. If the file was not
written to, it reappears in the questions directory (this is to handle
the case of clients that die). To see the answers to date, ls answers
or cat answers/*. If you want programs to find out when work is
completed,you could extend the server by having it create a 'status'
file at the top level; programs can read that and block on it, and
your server can distribute status info to as jobs are done.

With this new Plan 9 server, I was able to do distributed computation
entirely with shell scripts.

One old problem with task bags related to multiple users. What if more
than one user writes the task bag? This has been solved in complex
ways on Unixen, with authentication happening on the server so that
only the 'right' people get to the socket. This is solved trivially in
Plan 9 -- each user, needing this service, starts a task bag server --
problem solved.

The beauty of a server is that you can stop thinking in terms of RPC!
The "files" create a structure that is normally done in RPC. So, on
Plan 9, try to avoid falling into old Unix patterns. If you start
thinking in terms of a 'server' socket, and not a 9p server, and if
you start writing RPC code, the odds are good that you are not
approaching the problem correctly.

I ported a few HPC computations to my taskbag server; it was pretty
trivial. I *think* I left the code on sources (man 9fs to see what
this statement means) in the 9grid directory; you can look.

BTW, this sort of 'file-based' task bag idea has been done to death as
well, usually on NFS and with a huge boatload of scripts to go with
it. One system I saw, for MPEG encoding, had a 100-page (or so) script
to use NFS files for holding tasks. 99 pages were for dealing with the
problems that come with using NFS for this purpose ... not being able
to guarantee exclusive create, open, and remove makes these things
messy.

It's not just the idea of using files to describe work that is so
effective; it's the fact that the operations on the files are under
control of your server, and hence they're not really files, but a
representation of the work, shown to you as files. This disctinction
is very important. It's why a Plan 9 task bag server can work, and the
NFS-based task bag servers fail, and fail badly.

It's nice to have you involved in Plan 9; welcome aboard. Don't
hesitate to ask questions. It will be nice to see what you create.

thanks

ron

Reply via email to