Nice idea!

You might want to take a closer look at  
http://gliteui.wks.gorlaeus.net/LGI as LGI already defines and  
describes a very similar such an API (for LGI obviously ;-) and that  
might help devs and the designers to perhaps avoid some pitfalls. See  
especially http://gliteui.wks.gorlaeus.net/LGI/docs/LGI.pdf starting  
from page 22.

One important point not present in the design (and that will turn out  
to be very important in future based on my experiences with HPC  
clusters etc.) is user management. You certainly will want to manage  
users able to submit and  perhaps even assign priorities to their  
submitted jobs; at least you want to have ACLs to manage users and  
groups of users. Please take this into account from the beginning of  
the design and not as something that is 'added on' at a later stage.

Another pointer in that line of thought might be the use of x509  
certificate credentials in your API and something like a certificate  
revocation list or another way of revoking access / certificates of  
users. You can use the CN of the x509 cert to identify a user and  
which groups it is part of. Again; perhaps look at the LGI  
documentation (the pdf from above) on p 31. Think about the  
possibility to implement ACLs and job limits as extended fields within  
the x509 certs?

Perhaps also remember, a simple implemenation of user management was  
implemented in the queueing code (from Leiden Classical) a few years  
back now part of the BOINC code. That simple approach might be  
something to take a look at again during the design period of your  
API. The ACL design of LGI (in the pdf) is much more advanced; but it  
was designed with the experiences of the simple ACL implementation of  
that queue code.

Oh and also think about input checking; if several users start to  
submit jobs; one of them will bound to make a mistake at some point in  
time. If many jobs with invalid inputs are injected in the database;  
BOINC volunteers will start noticing this and complain most likely.  
Better to do some sanity checks on the input if possible per  
application somehow.

Another pointer (based from experiences with grids and starting off  
the use of LGI as a pilot job framework  
http://www.nikhef.nl/pub/projects/grid/gridwiki/index.php/LGI_Pilotjob_Framework)
 is to make sure your API scales. The submition of a single job should not have 
too much latency. It should not take hours to submit only a few thousands of 
jobs. The submition of many jobs is to be expected for BOINC; nobody will setup 
a BOINC project for only a few calculations  
;-).

Obviously here in Leiden we just use LGI and have the BOINC servers  
just as resources part of that infrastructure so we get all the needed  
scaling, ACLs, security and APIs (see for instance  
http://boinc.gorlaeus.net/ClassicalBuilder.php the Classical Builder  
which is a Java 3D GUI that makes use of the LGI API to submit  
Classical jobs to LGI picked up by our BOINC servers and scheduled  
within BOINC to the desktop grid). This allows us to combine clusters,  
supercomputers, EGEE grid resources and BOINC into a single API. We  
also made a python and an R interface to that API. This adds extra  
flexibility to scientists doing many many many jobs.

Finally think about the 'aborted' state of jobs; if a job is already  
running on some hosts; they cannot easily be aborted directly (and  
removed from the DB). Perhaps one should introduce an 'aborting' job  
status next to the 'aborted' indicating jobs that were sent out  
already and the ones that can be removed and stopped immediately  
because no host was working on them. Maybe one does not care for that  
subtle difference?

Hope to have given some usefull info...

m.

Quoting David Anderson <da...@ssl.berkeley.edu>:

> We plan to add an API for remote job submission soon.
> The design doc is here:
>
> http://boinc.berkeley.edu/trac/wiki/RemoteJobs
>
> Please read and comment.  I know that several projects
> have implemented their own mechanism of this sort,
> and I'd like to hear about their experience and ideas.
>
> -- David



_______________________________________________
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to