Re: [galaxy-dev] Creating multiple datasets in a libset

Rob Leclerc Mon, 29 Apr 2013 13:12:34 -0700

Hi Dannon,

I've written some code to (i) query a dataset to ensure that it's been
uploaded after a submit and (ii) to ensure a resulting dataset has been
written to the file.


*#Block until all datasets have been uploaded*
libset = submit(api_key, api_url + "libraries/%s/contents" % library_id,
data, return_formatted = False)
for ds in libset:
    while True:
        uploaded_file = display(api_key, api_url +
'libraries/%s/contents/%s' %(library_id, ds['id']), return_formatted=False)
        if uploaded_file['misc_info'] == None:
            time.sleep(1)
        else:
            break

*#Block until all result datasets have been saved to the filesystem*
result_ds_url = api_url + 'histories/' + history_id + '/contents/' +
dsh['id'];
while True:
    result_ds = display(api_key, result_ds_url, return_formatted=False)
        if result_ds["state"] == 'ok':
            break
        else:
            time.sleep(1)


Rob Leclerc, PhD
<http://www.linkedin.com/in/robleclerc> <https://twitter.com/#!/robleclerc>
P: (US) +1-(917)-873-3037
P: (Shanghai) +86-1-(861)-612-5469
Personal Email: rob.lecl...@aya.yale.edu


On Mon, Apr 29, 2013 at 11:18 AM, Dannon Baker <dannon.ba...@gmail.com>wrote:

> Yep, that example filesystem_paths you suggest should work fine.  The
> sleep() bit was a complete hack from the start, for simplicity in
> demonstrating a very basic pipeline, but what you probably want to do for a
> real implementation is query the dataset in question via the API, verify
> that the datatype/etc have been set, and only after that execute the
> workflow; instead of relying on sleep.
>
>
> On Mon, Apr 29, 2013 at 9:24 AM, Rob Leclerc <robert.lecl...@gmail.com>wrote:
>
>> Hi Dannon,
>>
>> Thanks for the response. Sorry to be pedantic, but just to make sure that
>> I understand the interpretation of this field on the other side of the API,
>> I would need to have something like the following:
>>
>> data['filesystem_paths'] = "/home/me/file1.vcf \n /home/me/file2.vcf /n
>> /home/me/file3.vcf"
>>
>> I assume I should also increase the time.sleep() to reflect the uploading
>> of extra files?
>>
>> Cheers,
>>
>> Rob
>>
>> Rob Leclerc, PhD
>> <http://www.linkedin.com/in/robleclerc><https://twitter.com/#!/robleclerc>
>> P: (US) +1-(917)-873-3037
>> P: (Shanghai) +86-1-(861)-612-5469
>> Personal Email: rob.lecl...@aya.yale.edu
>>
>>
>> On Mon, Apr 29, 2013 at 9:15 AM, Dannon Baker <dannon.ba...@gmail.com>wrote:
>>
>>> Hey Rob,
>>>
>>> That example_watch_folder.py does just submit exactly one at a time,
>>> executes the workflow, and then does the next all in separate transactions.
>>>  If you wanted to upload multiple filepaths at once, you'd just append more
>>> to the ''filesystem_paths' field (newline separated paths).
>>>
>>> -Dannon
>>>
>>>
>>> On Fri, Apr 26, 2013 at 11:54 PM, Rob Leclerc 
>>> <robert.lecl...@gmail.com>wrote:
>>>
>>>> I'm looking at example_watch_folder.py and it's not clear from the
>>>> example how you submit multiple datasets to a library. In the example, the
>>>> first submit returns a libset [] with only a single entry and then proceeds
>>>> to iterate through each dataset in the libset in the following section:
>>>>
>>>> data = {}
>>>>
>>>>    data['folder_id'] = library_folder_id
>>>>
>>>>    data['file_type'] = 'auto'
>>>>
>>>>    data['dbkey'] = ''
>>>>
>>>>    data['upload_option'] = 'upload_paths'
>>>>
>>>>
>>>>
>>>> *data['filesystem_paths'] = fullpath*
>>>>
>>>>    data['create_type'] = 'file'
>>>>
>>>>    libset = submit(api_key, api_url + "libraries/%s/contents" %
>>>> library_id, data, return_formatted = False)
>>>>
>>>>    time.sleep(5)
>>>>
>>>>    for ds in libset:
>>>>
>>>>        if 'id' in ds:
>>>>
>>>>                         wf_data = {}
>>>>
>>>>                         wf_data['workflow_id'] = workflow['id']
>>>>
>>>>                         wf_data['history'] = "%s - %s" % (fname,
>>>> workflow['name'])
>>>>
>>>>                         wf_data['ds_map'] = {}
>>>>
>>>>                         for step_id, ds_in in workflow['inputs'
>>>> ].iteritems():
>>>>
>>>>                             wf_data['ds_map'][step_id] = {'src':'ld',
>>>> 'id':ds['id']}
>>>>
>>>>                         res = submit( api_key, api_url + 'workflows',
>>>> wf_data, return_formatted=False)
>>>>
>>>>
>>>>
>>>> Rob Leclerc, PhD
>>>> <http://www.linkedin.com/in/robleclerc><https://twitter.com/#!/robleclerc>
>>>> P: (US) +1-(917)-873-3037
>>>> P: (Shanghai) +86-1-(861)-612-5469
>>>> Personal Email: rob.lecl...@aya.yale.edu
>>>>
>>>> ___________________________________________________________
>>>> Please keep all replies on the list by using "reply all"
>>>> in your mail client.  To manage your subscriptions to this
>>>> and other Galaxy lists, please use the interface at:
>>>>   http://lists.bx.psu.edu/
>>>>
>>>> To search Galaxy mailing lists use the unified search at:
>>>>   http://galaxyproject.org/search/mailinglists/
>>>>
>>>
>>>
>>
>

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Creating multiple datasets in a libset

Reply via email to