Re: [galaxy-dev] Specifying number of requested cores to Galaxy DRMAA

Louise-Amélie Schmitt Thu, 19 May 2011 06:56:50 -0700

Hi again Leandro

Well I might not have been really clear, perhaps I should have re-readthe mail before posting it :)

The thing is, it was not an issue of Torque starting jobs when therewere not enough resources available, but rather it believing the neededresources for each job being fewer that they were (e.g. always assumingthe jobs were single-threaded even if the actual tools needed more tanone core). if Torque is properly notified of the needed resources, itwill dispatch them or make them wait accordingly (since it knows thenodes' limits and load), like your LSF does.

This hack is not very sexy but it just notifies Torque of the coresneeded by every multithreaded tool, so it doesn't run a multithreadedjob when there's only one core available in the chosen node.


Hope that helps :)

Regards,
L-A


On 05/19/2011 03:05 PM, Leandro Hermida wrote:

Hi Louise-Amelie,

Thank you for the post reference, this is exactly what I was lookingfor. For us for for example when I want to execute a tool that is aJava command the JVM typically will typically use multiple cores asit's running. You said with TORQUE it will crash when there aren'tenough resources when the job is submitted. I wonder if you can dothe same thing we have done here with LSF? With LSF you can configurea maximum server load for each node and if the submitted jobs push thenode load above this threshold (e.g. more cores requested thanavailable) LSF will temporarily suspend jobs (using some kind ofheuristics) so that the load stays below the threshold and unsuspendas resources become available. So for us things just will run slowerwhen we cannot pass the requested number of cores to LSF.

I would think maybe there is a way with TORQUE to have it achieve thesame thing so jobs don't crash when resources requested are more thanavailable?


regards,
Leandro

2011/5/19 Louise-Amélie Schmitt <louise-amelie.schm...@embl.de<mailto:louise-amelie.schm...@embl.de>>


    Hi,

    In a previous message, I explained how I did to multithreads
    certain jobs, perhaps you can modify the corresponding files for
    drmaa in a similar way:

    On 04/26/2011 11:26 AM, Louise-Amélie Schmitt wrote:

    Just one little fix on line 261:
    261                 if ( len(l)>  1 and l[0] ==job_wrapper.tool.id  
<http://job_wrapper.tool.id>  ):

    Otherwise it pathetically crashes when non-multithreaded jobs are
    submitted. Sorry about that.

    Regards,
    L-A

    Le mardi 19 avril 2011 à 14:33 +0200, Louise-Amélie Schmitt a écrit :

    Hello everyone,

    I'm using TORQUE with Galaxy, and we noticed that if a tool is
    multithreaded, the number of needed cores is not communicated to pbs,
    leading to job crashes if the required resources are not available when
    the job is submitted.

    Therefore I modified a little the code as follows in
    lib/galaxy/jobs/runners/pbs.py

    256         # define PBS job options
    257         attrs.append( dict( name = pbs.ATTR_N, value = str( "%s_%s_%
    s" % ( job_wrapper.job_id,job_wrapper.tool.id  
<http://job_wrapper.tool.id>, job_wrapper.user ) ) ) )
    258         mt_file = open('tool-data/multithreading.csv', 'r')
    259         for l in mt_file:
    260                 l = string.split(l)
    261                 if ( l[0] ==job_wrapper.tool.id  
<http://job_wrapper.tool.id>  ):
    262                         attrs.append( dict( name = pbs.ATTR_l,
    resource = 'nodes', value = '1:ppn='+str(l[1]) ) )
    263                         attrs.append( dict( name = pbs.ATTR_l,
    resource = 'mem', value = str(l[2]) ) )
    264                         break
    265         mt_file.close()
    266         job_attrs = pbs.new_attropl( len( attrs ) +
    len( pbs_options ) )

    (sorry it didn't come out very well due to line breaking)

    The csv file contains a list of the multithreaded tools, each line
    containing:
    <tool id>\t<number of threads>\t<memory needed>\n

    And it works fine, the jobs wait for their turn properly, but
    information is duplicated. Perhaps there would be a way to include
    something similar in galaxy's original code (if it is not already the
    case, I may not be up-to-date) without duplicating data.

    I hope that helps :)

    Best regards,
    L-A

    ___________________________________________________________
    The Galaxy User list should be used for the discussion of
    Galaxy analysis and other features on the public server
    atusegalaxy.org  <http://usegalaxy.org>.  Please keep all replies on the 
list by
    using "reply all" in your mail client.  For discussion of
    local Galaxy instances and the Galaxy source code, please
    use the Galaxy Development list:

       http://lists.bx.psu.edu/listinfo/galaxy-dev

    To manage your subscriptions to this and other Galaxy lists,
    please use the interface at:

       http://lists.bx.psu.edu/



    On 05/19/2011 12:03 PM, Leandro Hermida wrote:

    Hi,

    When Galaxy is configured to use the DRMAA job runner is there a
    way for a tool to tell DRMAA the number of cores it would like to
    request? The equivalent of bsub -n X in LSF where X is min number
    of cores to have available on node.

    best,
    leandro


    ___________________________________________________________
    Please keep all replies on the list by using "reply all"
    in your mail client.  To manage your subscriptions to this
    and other Galaxy lists, please use the interface at:

       http://lists.bx.psu.edu/

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Specifying number of requested cores to Galaxy DRMAA

Reply via email to