Hi Lilach,

Sorry for the late response. Jen just confirmed the disadvantages of
my approach. I don't know how difficult could be for you to double
check the coordinates you have in your interval file are correct for
hg_g1k_v37. If you feel confident they will work and want to proceed,
you could do something like this outside of galaxy, you could also I'm
sure find a way to do it inside galaxy:

sed 's/^chr//' interval_file.csv > interval_file_g1k.csv

If you have coordinates for the mitochondrial chromosome you might
have to do also:
sed 's/^MT/M/' interval_file.csv > interval_file_g1k.csv

As if I remember correctly UCSC uses chrMT and GATK expects just M.
Please double check this as I'm not sure.

It would be also nice is there were a confirmation on what exactly
hg_g1k_v37 is, and where you could find annotations for it.
Annotations from Ensembl would do?

Regards,
Carlos

On Mon, Jun 25, 2012 at 5:22 PM, Jennifer Jackson <j...@bx.psu.edu> wrote:
> Hello Lilach,
>
> Currently, the human reference genome indexed for the GATK-beta tools is
> 'hg_g1k_v37'. The GATK-beta tools are under active revision by our team, so
> we expect there to be little to no change to the beta version on the main
> public instance until this is completed.
>
> Attempting to convert data between different builds is not recommended.
> These tools are very sensitive to exact inputs, which extends to naming
> conventions, etc. The best practice path is to start and continue an
> analysis project with the same exact genome build throughout.
>
> If you want to use the hg19 indexes provided by the GATK project, a cloud
> instance is the current option (using a hg19 genome as a 'custom genome'
> will exceed the processing limits available on the public Galaxy instance).
> Following the links on the GATK tools can provide more information about
> sources, including links on the GATK web site which will note the exact
> contents of the both of these genome versions, downloads, and other
> resources.
>
> Hopefully this helps to clear up any confusion,
>
> Best,
>
> Jen
> Galaxy team
>
>
> On 6/21/12 7:50 AM, Lilach Friedman wrote:
>
> Hi Jennifer,
> Thank you for this reply.
>
> I made a new BWA file, this time using the hg19(full) genome.
> However, when I am trying to use DepthOfCoverage, the reference genomr is
> stucked on the hg_g1k_v37 (this is the only option to select), and I cannot
> change it to hg19(full). Most probably, because I selected hg_g1k_v37 in the
> previous time I tried to use DepthOfCoverage.
> It seems as a bug? How can I change it?
>
> Thanks,
>   Lilach
>
>
> 2012/6/18 Jennifer Jackson <j...@bx.psu.edu>
>>
>> Hi Lilach,
>>
>> The problem with this analysis probably has to do with a mismatch between
>> the genomes: the intervals obtained from UCSC (hg19) and the BAM from your
>> BWA (hg_g1k_v37) run.
>>
>> UCSC does not contain the genome 'hg_g1k_v37' - the genome available from
>> UCSC is 'hg19'.
>>
>> Even though these are technically the same human release, on a practical
>> level, they have a different arrangement for some of the chromosomes. You
>> can compare NBCI GRCh37  with UCSC hg19 for an explanation. Reference
>> genomes must be exact in order to be used with tools - base for base. When
>> they are exact, the identifier will be exact between Galaxy and the source
>> (UCSC, Ensembl) or the full Build name will provide enough information to
>> make a connection to NCBI or other.
>>
>> Sometimes genomes are similar enough that a dataset sourced from one can
>> be used with another, if the database attribute is changed and the data from
>> the regions that differ is removed. This may be possible in your case, only
>> trying will let you know how difficult it actually is with your analysis.
>> The GATK pipeline is very sensitive to exact inputs. You will need to be
>> careful with genome database assignments, etc. Following the links on the
>> tool forms to the GATK help pages can provide some more detail about
>> expected inputs, if this is something that you are going to try.
>>
>> Good luck with the re-run!
>>
>> Jen
>> Galaxy team
>>
>>
>> On 6/18/12 4:42 AM, Lilach Friedman wrote:
>>
>> Hi,
>> I am trying to used Depth of Coverage to see the coverages is specific
>> intervals.
>> The intervals were taken from UCSC (exons of 2 genes), loaded to Galaxy
>> and the file type was changed to intervals.
>>
>> I gave to Depth of Coverage two BAM files (resulted from BWA, selection of
>> only raws with the Matching pattern: XT:A:U, and then SAM-to-BAM)
>> and the intervals file (in advanced GATK options).
>> The consensus genome is hg_g1k_v37.
>>
>> I got the following error message:
>>
>> An error occurred running this job: Picked up _JAVA_OPTIONS:
>> -Djava.io.tmpdir=/space/g2main
>> ##### ERROR
>> ------------------------------------------------------------------------------------------
>> ##### ERROR A USER ERROR has occurred (version 1.4-18-g80a4ce0):
>> ##### ERROR The invalid argume
>>
>>
>> Is it a bug, or did I do anything wrong?
>>
>> I will be grateful for any help.
>>
>> Thanks!
>>    Lilach
>>
>>
>> ___________________________________________________________
>> The Galaxy User list should be used for the discussion of
>> Galaxy analysis and other features on the public server
>> at usegalaxy.org.  Please keep all replies on the list by
>> using "reply all" in your mail client.  For discussion of
>> local Galaxy instances and the Galaxy source code, please
>> use the Galaxy Development list:
>>
>>   http://lists.bx.psu.edu/listinfo/galaxy-dev
>>
>> To manage your subscriptions to this and other Galaxy lists,
>> please use the interface at:
>>
>>   http://lists.bx.psu.edu/
>>
>>
>> --
>> Jennifer Jackson
>> http://galaxyproject.org
>
>
> --
> Jennifer Jackson
> http://galaxyproject.org
>
>
>
>
> ___________________________________________________________
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
>
>  http://lists.bx.psu.edu/listinfo/galaxy-dev
>
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
>
>  http://lists.bx.psu.edu/

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to