Hi Dannon,

I'm stoked that Charles's PR made it into master -- thanks!

Have you had a chance to look into the job_work and temp extra_dir issue?
Please let me know if there's anything I can do to help out!


Cheers,

Brian

On Mon, Feb 9, 2015 at 1:39 PM, Dannon Baker <dannon.ba...@gmail.com> wrote:

> Hey Brian,
>
> Thanks for the interest in Galaxy's Swift object store!  I also tested
> Charles' PR and it looks like a nice improvement -- I'll go ahead and get
> that pulled into Galaxy shortly.
>
> The HierarchicalObjectStore was written with exactly what you're trying to
> do in mind, so you're definitely on the right track here.  I'll see if I
> can verify and fix the file location issues you point out and will get back
> to you.
>
> -Dannon
>
>
>
> On Mon Feb 09 2015 at 4:29:35 PM Brian Claywell <bclay...@fredhutch.org>
> wrote:
>
>> Our local instance currently uses the traditional directories under
>> `database/` for datasets, job working directories, and temporary
>> files. Ultimately we wish to transition to using our Swift object
>> store for storage. We've been doing some experimentation with Galaxy's
>> Swift backend and have run into a few issues.
>>
>> The first major issue we came across was Swift's 5 GB segment size
>> limit, since the segmentation/multipart upload code is bypassed for
>> instances of SwiftObjectStore [1]. SwiftStack support provided a patch
>> enabling multipart uploads for Swift (PR #648) which has been working
>> well for us so far. (Thanks, Charles!)
>>
>> The next issue is that the path attribute of the cache tag in
>> object_store_conf.xml appears to be ignored. The value does get stored
>> to self.cache_path in _parse_config_xml, but elsewhere in the file
>> self.staging_path is used instead.
>>
>> Finally, adding extra_dir tags to the Swift object store config
>> doesn't appear to do anything. Here's my object_store_conf.xml:
>>
>> <?xml version="1.0"?>
>> <object_store type="hierarchical">
>>     <backends>
>>        <object_store type="swift" id="primary" order="0">
>>             <auth access_key="..." secret_key="..."/>
>>             <bucket name="galaxy_store"/>
>>             <connection host="tin.fhcrc.org" port="443"/>
>>             <cache path="database/object_store_cache" size="1000"/>
>>             <extra_dir type="temp" path="database/tmp"/>
>>             <extra_dir type="job_work" path="database/job_working_
>> directory"/>
>>         </object_store>
>>         <object_store type="disk" id="secondary" order="1">
>>             <files_dir path="database/files"/>
>>         </object_store>
>>     </backends>
>> </object_store>
>>
>> The goal with the hierarchical setup above is for new datasets to be
>> created in the primary (Swift) object store, caching to
>> `database/object_store_cache`, while the job and temporary directories
>> remain at `database/job_working_directory` and `database/tmp`,
>> respectively. Existing (pre-Swift) datasets remain in `database/files`
>> and are handled by the secondary disk store.
>>
>> What actually happens (after renaming self.cache_path to
>> self.staging_path in _parse_config_xml to get the cache path working)
>> is this:
>>
>> galaxy.jobs DEBUG 2015-02-06 16:07:26,615 (1) Working directory for
>> job is: /home/bclaywel/workspace/galaxy-central/database/
>> object_store_cache/000/1
>>
>> That is, the job working directory is created directly under the cache
>> path's hash directories. I assume temp files would probably end up
>> there also.
>>
>> We're quite excited to get Galaxy and Swift working well together, and
>> I'm more than happy to help debug and test!
>>
>>
>> Cheers,
>>
>> Brian
>>
>>
>> [1] https://bitbucket.org/galaxy/galaxy-central/src/
>> 54ed3adb6575addba47d627944ebd72f7547082d/lib/galaxy/
>> objectstore/s3.py?at=default#cl-331
>>
>> --
>> Brian Claywell | programmer/analyst
>> Matsen Group   | http://matsen.fredhutch.org
>> Fred Hutchinson Cancer Research Center
>> ___________________________________________________________
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>   https://lists.galaxyproject.org/
>>
>> To search Galaxy mailing lists use the unified search at:
>>   http://galaxyproject.org/search/mailinglists/
>
>


-- 
Brian Claywell | programmer/analyst
Matsen Group   | http://matsen.fredhutch.org <http://matsen.fhcrc.org/>
Fred Hutchinson Cancer Research Center
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to