[galaxy-dev] Defining Job Runners Dynamically

John Chilton Sat, 15 Oct 2011 19:43:23 -0700

Hello All,

I just issued a pull request that augments Galaxy to allow definingjob runners dynamically at runtime(https://bitbucket.org/galaxy/galaxy-central/pull-request/12/dynamic-job-runners).Whether it makes the cut or not, I thought I would describe enhancementshere in case anyone else would find it useful.

There a couple use cases we hope this will help us address for ourinstitution - one is dynamically switching queues based on user (we havea very nice shared memory resource that can only be used by researcherswith NIH funding) and the other is inspecting input sizes to give moreaccurate max walltimes to pbs (a small number of cufflinks jobs forinstance take over three days on our cluster but defining max walltimesin excess of that for all jobs could result in our queue sitting idlearound our monthly downtimes). You might also imagine using this todynamically switch queues entirely based on input sizes or parameters,or alter queue priorities based on the submitting user or inputsizes/parameters.

There are two steps to use this - you must add a line in universe.iniand define a function to compute the true job runner string in the newfile lib/galaxy/jobs/rules.py.

This first step is similar to what you would do to statically assigna tool to a particular job runner. If you would like to dynamicallyassign a job runner for cufflinks you would start by adding a line likeone of the following to universe.ini


cufflinks = dynamic:///python
-or-
cufflinks = dynamic:///python/compute_runner

If you use the first form, a function called cufflinks must be definedin rules.py, adding the extra argument after python/ lets you specify aparticular function by name (compute_runner in this example). Thissecond option could let you assign job runners with the same functionfor multiple tools.

The only other step is to define a python function in rules.py thatproduces a string corresponding to a valid job runner such as"local:///" or "pbs:///queue/-l walltime=48:00:00/".

If the functions defined in this file take in arguments, these argumentsshould have names from the follow list: job_wrapper, user_email, app,job, tool, tool_id, job_id, user. The plumbing will map these argumentsto the implied galaxy object. For instance, job_wrapper is theJobWrapper instance for the job that gets passed to the job runner,user_email is the user's email address or None, app is the mainapplication configuration object used throughout the code base that canbe used for instance to get values defined in universe.ini, job, tool,and user are model objects, and job_id and tool_id the relevant ids.

If you are writing a function that routes a certain list of users to aparticular queue or increases their priority, you will probably onlyneed to take in one argument - user_email. However, if you are going tolook at input file sizes you may want to take in an argument called joband use the following piece of code to find the input size for inputnamed "input1" in the tool xml.

inp_data = dict( [ ( da.name, da.dataset ) for da injob.input_datasets ] )inp_data.update( [ ( da.name, da.dataset ) for da injob.input_library_datasets ] )

    input1_file = inp_data[ "input1" ].file_name
    input1_size = os.path.getsize( input1_file )

This whole concept works for a couple of small tests on my localmachine, but there are certain aspects of the job runner code that makesme feel there may be corner cases I am not seeing where this approachmay not work - so your millage may vary.


-John

------------------------------------------------
John Chilton
Software Developer
University of Minnesota Supercomputing Institute
Office: 612-625-0917
Cell: 612-226-9223
E-Mail: chil...@msi.umn.edu

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Defining Job Runners Dynamically

Reply via email to