Hello Lilach,

The genome build 'hg_g1k_v37' is build "b37" in the GATK documentation. Hg19 is also included (as a distinct build). I encourage you to examine these if you are interested in crossing over between genomes or identifying other projects that have data based on the same genome build.

http://www.broadinstitute.org/gsa/wiki/index.php/Introduction_to_the_GATK ->
http://www.broadinstitute.org/gsa/wiki/index.php/GATK_resource_bundle

" GATK resource bundle: A collection of standard files for working with human resequencing data with the GATK.

The standard reference sequence we use in the GATK is the the b37 edition from the Human Genome Reference Consortium. All of the key GATK data files are available against this reference sequence. Additionally, we used to use UCSC-style (chr1, not 1) for build hg18, and provide lifted-over files from b37 to hg18 for those still using those files.

b37 resources: the standard data set
* Reference sequence (standard 1000 Genomes fasta) along with fai and dict files
<more, please follow link for details ...>

hg19 resources: lifted over from b37
* Includes the UCSC-style hg19 reference along with all lifted over VCF files."

Hopefully this helps,

Jen
Galaxy team

On 6/27/12 7:09 AM, Lilach Friedman wrote:
May I join to the question of Carlos? what is exactly hg_g1k_v37? and how can I get the intervals of specific genes in this format?

Thanks,
  Lilach


2012/6/27 Lilach Friedman <lilac...@gmail.com <mailto:lilac...@gmail.com>>

    Hi Jennifer,
    Is there a way to directly upload my files from the public Galaxy
    to my cloud Galaxy instance (in AWS)? Or should I download them
    first to my computer, and then to upload them? (It takes a lot of
    time because of the low  uploading speed).

    Thanks,
       Lilach


    2012/6/26 Jennifer Jackson <j...@bx.psu.edu <mailto:j...@bx.psu.edu>>

        Hello Lilach,

        Currently, the human reference genome indexed for the
        GATK-beta tools is 'hg_g1k_v37'. The GATK-beta tools are under
        active revision by our team, so we expect there to be little
        to no change to the beta version on the main public instance
        until this is completed.

        Attempting to convert data between different builds is not
        recommended. These tools are very sensitive to exact inputs,
        which extends to naming conventions, etc. The best practice
        path is to start and continue an analysis project with the
        same exact genome build throughout.

        If you want to use the hg19 indexes provided by the GATK
        project, a cloud instance is the current option (using a hg19
        genome as a 'custom genome' will exceed the processing limits
        available on the public Galaxy instance). Following the links
        on the GATK tools can provide more information about sources,
        including links on the GATK web site which will note the exact
        contents of the both of these genome versions, downloads, and
        other resources.

        Hopefully this helps to clear up any confusion,

        Best,

        Jen
        Galaxy team


        On 6/21/12 7:50 AM, Lilach Friedman wrote:
        Hi Jennifer,
        Thank you for this reply.

        I made a new BWA file, this time using the hg19(full) genome.
        However, when I am trying to use DepthOfCoverage, the
        reference genomr is stucked on the hg_g1k_v37 (this is the
        only option to select), and I cannot change it to hg19(full).
        Most probably, because I selected hg_g1k_v37 in the previous
        time I tried to use DepthOfCoverage.
        It seems as a bug? How can I change it?

        Thanks,
          Lilach


        2012/6/18 Jennifer Jackson <j...@bx.psu.edu
        <mailto:j...@bx.psu.edu>>

            Hi Lilach,

            The problem with this analysis probably has to do with a
            mismatch between the genomes: the intervals obtained from
            UCSC (hg19) and the BAM from your BWA (hg_g1k_v37) run.

            UCSC does not contain the genome 'hg_g1k_v37' - the
            genome available from UCSC is 'hg19'.

            Even though these are technically the same human release,
            on a practical level, they have a different arrangement
            for some of the chromosomes. You can compare NBCI GRCh37
            <http://www.ncbi.nlm.nih.gov/genome/assembly/2758/>  with
            UCSC hg19 <http://genome.ucsc.edu>for an explanation.
            Reference genomes must be /exact/ in order to be used
            with tools - base for base. When they are exact, the
            identifier will be exact between Galaxy and the source
            (UCSC, Ensembl) or the full Build name will provide
            enough information to make a connection to NCBI or other.

            Sometimes genomes are similar enough that a dataset
            sourced from one can be used with another, if the
            database attribute is changed and the data from the
            regions that differ is removed. This may be possible in
            your case, only trying will let you know how difficult it
            actually is with your analysis. The GATK pipeline is very
            sensitive to exact inputs. You will need to be careful
            with genome database assignments, etc. Following the
            links on the tool forms to the GATK help pages can
            provide some more detail about expected inputs, if this
            is something that you are going to try.

            Good luck with the re-run!

            Jen
            Galaxy team


            On 6/18/12 4:42 AM, Lilach Friedman wrote:
            Hi,
            I am trying to used Depth of Coverage to see the
            coverages is specific intervals.
            The intervals were taken from UCSC (exons of 2 genes),
            loaded to Galaxy and the file type was changed to intervals.

            I gave to Depth of Coverage two BAM files (resulted from
            BWA, selection of only raws with the Matching pattern:
            XT:A:U, and then SAM-to-BAM)
            and the intervals file (in advanced GATK options).
            The consensus genome is hg_g1k_v37.

            I got the following error message:

            An error occurred running this job: /Picked up
            _JAVA_OPTIONS: -Djava.io.tmpdir=/space/g2main
            ##### ERROR
            
------------------------------------------------------------------------------------------
            ##### ERROR A USER ERROR has occurred (version
            1.4-18-g80a4ce0):
            ##### ERROR The invalid argume


            /Is it a bug, or did I do anything wrong?

            I will be grateful for any help.

            Thanks!
               Lilach/
            /


            ___________________________________________________________
            The Galaxy User list should be used for the discussion of
            Galaxy analysis and other features on the public server
            atusegalaxy.org  <http://usegalaxy.org>.  Please keep all replies 
on the list by
            using "reply all" in your mail client.  For discussion of
            local Galaxy instances and the Galaxy source code, please
            use the Galaxy Development list:

               http://lists.bx.psu.edu/listinfo/galaxy-dev

            To manage your subscriptions to this and other Galaxy lists,
            please use the interface at:

               http://lists.bx.psu.edu/

-- Jennifer Jackson
            http://galaxyproject.org


-- Jennifer Jackson
        http://galaxyproject.org




--
Jennifer Jackson
http://galaxyproject.org



___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to