I'm curious what is this genome called 'hg_g1k_v37'
and how does it correspond to NCBI GRCh37 which is
identical to UCSC hg19 ?
Jennifer Jackson wrote:
UCSC does not contain the genome 'hg_g1k_v37' - the genome available
from UCSC is 'hg19'.
Even though these are technically the same human release, on a practical
level, they have a different arrangement for some of the chromosomes.
You can compare NBCI GRCh37
<http://www.ncbi.nlm.nih.gov/genome/assembly/2758/> with UCSC hg19
<http://genome.ucsc.edu> for an explanation. Reference genomes must be
/exact/ in order to be used with tools - base for base. When they are
exact, the identifier will be exact between Galaxy and the source (UCSC,
Ensembl) or the full Build name will provide enough information to make
a connection to NCBI or other.
Sometimes genomes are similar enough that a dataset sourced from one can
be used with another, if the database attribute is changed and the data
from the regions that differ is removed. This may be possible in your
case, only trying will let you know how difficult it actually is with
your analysis. The GATK pipeline is very sensitive to exact inputs. You
will need to be careful with genome database assignments, etc. Following
the links on the tool forms to the GATK help pages can provide some more
detail about expected inputs, if this is something that you are going to
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
To manage your subscriptions to this and other Galaxy lists,
please use the interface at: