Re: [galaxy-dev] Loc file configuration question

Federico De Masi Wed, 09 May 2012 13:46:14 -0700


On 09/05/2012 22:40, Daniel Blankenberg wrote:

Hi Raja,

Can you check that your fields are tab separated and not spaces (they
are spaces below, but that could be a copy and paste artifact)?


Thanks for using Galaxy,

Dan


On May 9, 2012, at 9:45 AM, Raja Kelkar wrote:

Hi Jen,

Thank you for your response. I seem to have all the relevant entries
in the two "*.loc" files you mentioned (paths in all_fasta files and
the twobit files are different because of the way we have the files
stored. I also converted the 2bit files to .fa and have them available
in the same twobit directory).

But the feature extraction is still not working.

Here are the relevant entries in files (I have redacted specific file
paths and replaced them with "path_to"):

twobit.loc

hg18 /path_to/twobit/hg18.2bit
hg19 /path_to/twobit/hg19.2bit
mm9 /path_to/twobit/mm9.2bit
mm8 /path_to/twobit/mm8.2bit

all_fasta.loc

hg19full hg19 Human (Homo sapiens): hg19 Full
/path_to/hg19/bwa_path/hg19_all.fa
hg19_chr_only hg19_chr Human (Homo sapiens): hg19_chrom_only
/path_to/hg19/bwa_path/hg19.fa
hg18full hg18 Human (Homo sapiens): hg18 Full
/path_to/hg18/bwa_path/hg18_all.fa
hg18_chr_only hg18_chr Human (Homo sapiens): hg18_chrom_only
/path_to/hg18/bwa_path/hg18_chrom_only.fa


I assume that the second field in the (all_fasta.loc) file <dbkey> has
to match the builds.txt file in the "ucsc" directory. Is that correct?
It does in this case. I think I am missing something subtle here.

The "*.loc.sample" files are great but the information contained in
those is confusing. I am not sure why there are two examples of the
same info (as far as I can tell) in most sample loc files.

Thanks.


On Tue, May 8, 2012 at 6:48 PM, Jennifer Jackson <[email protected]
<mailto:[email protected]>> wrote:

    Hi Raja,

    This tool uses a <database>.2bit file to extract sequence data
    when the 'Locally cashed' option is used. The <database> is a
    genome that you install locally. ".2bit" format was developed by
    UCSC and they are the source for many genomes in this format
    already and for tools (compiled and uncompiled) to transform fasta
    data into/from .2bit format (faTwoToBit and twoBitToFa):
    http://hgdownload.cse.ucsc.__edu/downloads.html
    <http://hgdownload.cse.ucsc.edu/downloads.html> (genomes + source)
    http://hgdownload.cse.ucsc.__edu/admin/exe/linux.x86_64/
    <http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/> (compiled
    utilities)

    For the extract tool, the builds list is required:
    http://wiki.g2.bx.psu.edu/__Admin/Data%20Integration
    <http://wiki.g2.bx.psu.edu/Admin/Data%20Integration>

    You don't actually need to have more NGS set up beyond that.
    Still, this wiki can help.
    http://wiki.g2.bx.psu.edu/__Admin/NGS%20Local%20Setup
    <http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup>

    For example, the <database>.2bit file could be placed with your
    .fa files like:

    /galaxy-dist/tool-data/genome/__<databaseA>/seq/<databaseA>.__2bit <<
    /galaxy-dist/tool-data/genome/__<databaseA>/seq/<databaseA>.fa
    /galaxy-dist/tool-data/genome/__<databaseB>/bowtie/
    /galaxy-dist/tool-data/genome/__<databaseB>/sam/
    /galaxy-dist/tool-data/genome/__<databaseB>/seq/<databaseB>.__2bit <<
    /galaxy-dist/tool-data/genome/__<databaseB>/seq/<databaseB>.fa
    /galaxy-dist/tool-data/genome/__<databaseC>/seq/<databaseC>.__2bit <<
    /galaxy-dist/tool-data/genome/__<databaseC>/seq/<databaseC>.fa
    /galaxy-dist/tool-data/genome/__<databaseD>/seq/<databaseD>.__2bit <<
    /galaxy-dist/tool-data/genome/__<databaseD>/seq/<databaseD>.fa

    Then the .loc file is here:

    /galaxy-dist/tool-data/twobit.__loc.sample

    You will probably have this for all genomes as well:

    /galaxy-dist/tool-data/all___fasta.loc.sample

    Remove the ".sample" before using these. Instructions for how to
    populate each are in the files themselves.

    The only gtf/gff files associated with this tool would be datasets
    from the history, so there are no gtf/gff data to stage before
    using the tool. To have the tool use a particular genome, set the
    query dataset (interval, bed, gtf) to have the same database
    identifier as you used for the "<database>" part of the
    "<database>.2bit" file. (This is why the builds list is required).

    If you make changes to data, don't forget to restart your server
    to see the changes.

    Hopefully this helps,

    Jen
    Galaxy team


    On 5/8/12 12:46 PM, Raja Kelkar wrote:

        I have two questions that pertain to a local install of galaxy:

        1. I have been having trouble getting the “fetch sequences” à
        “extract
        genomic DNA” tool to work. Can someone identify the specific
        *.loc file
        that needs to have the info about the location of the genome
        sequence files?

        I get the following error when I run the extract tool:

        /No sequences are available for 'hg19’, request them by
        reporting this
        error./

        //


        2. What configuration file(s) need to contain locations for
        the gtf/gff
        files?


        Thanks.



        _____________________________________________________________
        Please keep all replies on the list by using "reply all"
        in your mail client. To manage your subscriptions to this
        and other Galaxy lists, please use the interface at:

        http://lists.bx.psu.edu/


    --
    Jennifer Jackson
    http://galaxyproject.org <http://galaxyproject.org/>


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/




___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/


--
Federico De Masi, PhD,
Assistant Professor
The Technical University of Denmark - DTU
Center for Biological Sequence Analysis - CBS
Kemitorvet 208/002
DK-2800 KGS. LYNGBY, DENMARK
Telephone: (+45) 45 25 24 21
Fax: (+45) 45 93 15 85
http://rg.cbs.dtu.dk
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

Re: [galaxy-dev] Loc file configuration question

Reply via email to