Re: [galaxy-dev] Loc file configuration question

Federico De Masi Thu, 10 May 2012 13:31:41 -0700

Hi,

I agree that *.loc files should be consolidated or, at least, we shouldhave proper documentation about which loc file is required for each tool...

Took me quite some time to find that solution.

I would suggest can we have a open wiki where these things areannotated. ie: if I find a "trick" or a shortcut to something, shouldn'twe have a centralised place to share it with the community, rather thanhaving Nate and co waste their precious time answering over and over thesame questions?

Mailing lists are cool, but sometimes heavy to search and to find theproper answers..


my 2p or maybe I missed something...

Cheers,

Fred


On 10/05/2012 15:35, Raja Kelkar wrote:

Hi Fred,

Thanks for the tip on the alignseq file. It did work (I do now have
sequence that came back from the tool, will have to check if it correct).

Anyone have a logical explanation?

Perhaps these myriad loc files can be streamlined down to something
simple in future.

Thanks.

Dan: The entries I had in local loc files were all tab delimited.

On Wed, May 9, 2012 at 4:49 PM, Federico De Masi <[email protected]
<mailto:[email protected]>> wrote:

    Hi,

    I was having the same issue just today and my solution was to add:

    seq     mm9     /path_to/twobit/mm9.2bit

    in the alignseq.loc file as .nib has been replaced by 2bit.
    Plus all necessaty entries in all_fasta.loc and twobit.loc

    That worked :)

    Hope this helps.

    Fred



    On 09/05/2012 22:40, Daniel Blankenberg wrote:

        Hi Raja,

        Can you check that your fields are tab separated and not spaces
        (they
        are spaces below, but that could be a copy and paste artifact)?


        Thanks for using Galaxy,

        Dan


        On May 9, 2012, at 9:45 AM, Raja Kelkar wrote:

            Hi Jen,

            Thank you for your response. I seem to have all the relevant
            entries
            in the two "*.loc" files you mentioned (paths in all_fasta
            files and
            the twobit files are different because of the way we have
            the files
            stored. I also converted the 2bit files to .fa and have them
            available
            in the same twobit directory).

            But the feature extraction is still not working.

            Here are the relevant entries in files (I have redacted
            specific file
            paths and replaced them with "path_to"):

            twobit.loc

            hg18 /path_to/twobit/hg18.2bit
            hg19 /path_to/twobit/hg19.2bit
            mm9 /path_to/twobit/mm9.2bit
            mm8 /path_to/twobit/mm8.2bit

            all_fasta.loc

            hg19full hg19 Human (Homo sapiens): hg19 Full
            /path_to/hg19/bwa_path/hg19___all.fa
            hg19_chr_only hg19_chr Human (Homo sapiens): hg19_chrom_only
            /path_to/hg19/bwa_path/hg19.fa
            hg18full hg18 Human (Homo sapiens): hg18 Full
            /path_to/hg18/bwa_path/hg18___all.fa
            hg18_chr_only hg18_chr Human (Homo sapiens): hg18_chrom_only
            /path_to/hg18/bwa_path/hg18___chrom_only.fa


            I assume that the second field in the (all_fasta.loc) file
            <dbkey> has
            to match the builds.txt file in the "ucsc" directory. Is
            that correct?
            It does in this case. I think I am missing something subtle
            here.

            The "*.loc.sample" files are great but the information
            contained in
            those is confusing. I am not sure why there are two examples
            of the
            same info (as far as I can tell) in most sample loc files.

            Thanks.


            On Tue, May 8, 2012 at 6:48 PM, Jennifer Jackson
            <[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>> wrote:

                Hi Raja,

                This tool uses a <database>.2bit file to extract
            sequence data
                when the 'Locally cashed' option is used. The <database>
            is a
                genome that you install locally. ".2bit" format was
            developed by
                UCSC and they are the source for many genomes in this format
                already and for tools (compiled and uncompiled) to
            transform fasta
                data into/from .2bit format (faTwoToBit and twoBitToFa):
            http://hgdownload.cse.ucsc.____edu/downloads.html

            <http://hgdownload.cse.ucsc.__edu/downloads.html
            <http://hgdownload.cse.ucsc.edu/downloads.html>> (genomes +
            source)
            http://hgdownload.cse.ucsc.____edu/admin/exe/linux.x86_64/

            <http://hgdownload.cse.ucsc.__edu/admin/exe/linux.x86_64/
            <http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/>>
            (compiled
                utilities)

                For the extract tool, the builds list is required:
            http://wiki.g2.bx.psu.edu/____Admin/Data%20Integration
            <http://wiki.g2.bx.psu.edu/__Admin/Data%20Integration>

            <http://wiki.g2.bx.psu.edu/__Admin/Data%20Integration
            <http://wiki.g2.bx.psu.edu/Admin/Data%20Integration>>

                You don't actually need to have more NGS set up beyond that.
                Still, this wiki can help.
            http://wiki.g2.bx.psu.edu/____Admin/NGS%20Local%20Setup
            <http://wiki.g2.bx.psu.edu/__Admin/NGS%20Local%20Setup>

            <http://wiki.g2.bx.psu.edu/__Admin/NGS%20Local%20Setup
            <http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup>>

                For example, the <database>.2bit file could be placed
            with your
                .fa files like:


              /galaxy-dist/tool-data/genome/____<databaseA>/seq/<databaseA>.____2bit 
<<

              /galaxy-dist/tool-data/genome/____<databaseA>/seq/<databaseA>.__fa
                /galaxy-dist/tool-data/genome/____<databaseB>/bowtie/
                /galaxy-dist/tool-data/genome/____<databaseB>/sam/

              /galaxy-dist/tool-data/genome/____<databaseB>/seq/<databaseB>.____2bit 
<<

              /galaxy-dist/tool-data/genome/____<databaseB>/seq/<databaseB>.__fa

              /galaxy-dist/tool-data/genome/____<databaseC>/seq/<databaseC>.____2bit 
<<

              /galaxy-dist/tool-data/genome/____<databaseC>/seq/<databaseC>.__fa

              /galaxy-dist/tool-data/genome/____<databaseD>/seq/<databaseD>.____2bit 
<<

              /galaxy-dist/tool-data/genome/____<databaseD>/seq/<databaseD>.__fa


                Then the .loc file is here:

                /galaxy-dist/tool-data/twobit.____loc.sample


                You will probably have this for all genomes as well:

                /galaxy-dist/tool-data/all_____fasta.loc.sample


                Remove the ".sample" before using these. Instructions
            for how to
                populate each are in the files themselves.

                The only gtf/gff files associated with this tool would
            be datasets
                from the history, so there are no gtf/gff data to stage
            before
                using the tool. To have the tool use a particular
            genome, set the
                query dataset (interval, bed, gtf) to have the same database
                identifier as you used for the "<database>" part of the
            "<database>.2bit" file. (This is why the builds list is
            required).

                If you make changes to data, don't forget to restart
            your server
                to see the changes.

                Hopefully this helps,

                Jen
                Galaxy team


                On 5/8/12 12:46 PM, Raja Kelkar wrote:

                    I have two questions that pertain to a local install
            of galaxy:

                    1. I have been having trouble getting the “fetch
            sequences” à
                    “extract
                    genomic DNA” tool to work. Can someone identify the
            specific
                    *.loc file
                    that needs to have the info about the location of
            the genome
                    sequence files?

                    I get the following error when I run the extract tool:

                    /No sequences are available for 'hg19’, request them by
                    reporting this
                    error./

                    //


                    2. What configuration file(s) need to contain
            locations for
                    the gtf/gff
                    files?


                    Thanks.




              _________________________________________________________________

                    Please keep all replies on the list by using "reply all"
                    in your mail client. To manage your subscriptions to
            this
                    and other Galaxy lists, please use the interface at:

            http://lists.bx.psu.edu/


                --
                Jennifer Jackson
            http://galaxyproject.org <http://galaxyproject.org/>



            _____________________________________________________________
            Please keep all replies on the list by using "reply all"
            in your mail client. To manage your subscriptions to this
            and other Galaxy lists, please use the interface at:

            http://lists.bx.psu.edu/




        _____________________________________________________________
        Please keep all replies on the list by using "reply all"
        in your mail client.  To manage your subscriptions to this
        and other Galaxy lists, please use the interface at:

        http://lists.bx.psu.edu/


    --
    Federico De Masi, PhD,
    Assistant Professor
    The Technical University of Denmark - DTU
    Center for Biological Sequence Analysis - CBS
    Kemitorvet 208/002
    DK-2800 KGS. LYNGBY, DENMARK
    Telephone: (+45) 45 25 24 21 <tel:%28%2B45%29%2045%2025%2024%2021>
    Fax: (+45) 45 93 15 85 <tel:%28%2B45%29%2045%2093%2015%2085>
    http://rg.cbs.dtu.dk




___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/


--
Federico De Masi, PhD,
Assistant Professor
The Technical University of Denmark - DTU
Center for Biological Sequence Analysis - CBS
Kemitorvet 208/002
DK-2800 KGS. LYNGBY, DENMARK
Telephone: (+45) 45 25 24 21
Fax: (+45) 45 93 15 85
http://rg.cbs.dtu.dk
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

Re: [galaxy-dev] Loc file configuration question

Reply via email to