I'm curious what is this genome called 'hg_g1k_v37'
and how does it correspond to NCBI GRCh37 which is
identical to UCSC hg19 ?


Jennifer Jackson wrote:
UCSC does not contain the genome 'hg_g1k_v37' - the genome available from UCSC is 'hg19'.

Even though these are technically the same human release, on a practical level, they have a different arrangement for some of the chromosomes. You can compare NBCI GRCh37 <http://www.ncbi.nlm.nih.gov/genome/assembly/2758/> with UCSC hg19 <http://genome.ucsc.edu> for an explanation. Reference genomes must be /exact/ in order to be used with tools - base for base. When they are exact, the identifier will be exact between Galaxy and the source (UCSC, Ensembl) or the full Build name will provide enough information to make a connection to NCBI or other.

Sometimes genomes are similar enough that a dataset sourced from one can be used with another, if the database attribute is changed and the data from the regions that differ is removed. This may be possible in your case, only trying will let you know how difficult it actually is with your analysis. The GATK pipeline is very sensitive to exact inputs. You will need to be careful with genome database assignments, etc. Following the links on the tool forms to the GATK help pages can provide some more detail about expected inputs, if this is something that you are going to try.
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:


To manage your subscriptions to this and other Galaxy lists,
please use the interface at:


Reply via email to