Hi Felix,
On 8/13/20 21:43, Felix Ernst wrote:
Hi Leonard, Hi Herve,
I followed your conversation, since I have noticed the same problem. Thanks,
Herve, for the explanation of the recent changes on hg19.
The GRCh37.P13 report states in its last line:
MT assembled-molecule MT Mitochondrion J01415.2 =
NC_012920.1 non-nuclear 16569 chrM
Since the last name is called "UCSC-style-name", wouldn't that mean that chrM
has to be renamed to MT and not chrMT?
This is a mistake in the sequence report for GRCh37.p13. GRCh37.p13:MT
is the same as hg19:chrMT, not hg19:chrM.
hg19:chrM and hg19:chrMT are **not** the same sequences. The former is
NC_001807 and has length 16571 and the latter is NC_012920.1 and has
length 16569.
Yes, seqlevelsStyle() is sorting out all this mess for you ;-)
Cheers,
H.
Thanks again for the explanation.
Cheers,
Felix
-----Ursprüngliche Nachricht-----
Von: Bioc-devel <bioc-devel-boun...@r-project.org> Im Auftrag von Hervé Pagès
Gesendet: Freitag, 14. August 2020 01:08
An: Leonard Goldstein <goldstein.leon...@gene.com>; bioc-devel@r-project.org
Cc: charlotte.sone...@fmi.ch
Betreff: Re: [Bioc-devel] BSgenome changes
Hi Leonard,
On 8/12/20 15:22, Leonard Goldstein via Bioc-devel wrote:
Dear Bioc team,
I'm following up on this recent GitHub issue
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ldg21
_SGSeq_issues_5&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYvfbojaqTJZVg&s=Tfk-tDM99P63dnsvMydG2phv5WQPVbJzPk0hzi-_1SE&e=
>. Please see the issue for more details and code examples.
It looks like changes in Bioc devel result in two copies of the
mitochondrial chromosome for BSgenome.Hsapiens.UCSC.hg19 -- one named
chrM like in previous package versions (length 16571) and one named
chrMT (length 16569).
When using seqlevelsStyle() to change chromosome names from UCSC to
NCBI format, this results in new behavior -- in the past chrM was
simply renamed MT, now the different sequence chrMT is used. Is this intended?
Absolutely intended.
There is a long story behind the unfortunate fate of the mitochondrial
chromosome in hg19. I'll try to keep it short.
When the UCSC folks released the hg19 browser more than 10 years ago, they
based it on assembly GRCh37:
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.13&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=jWtgKVQGC-SQp6i4prhKBiD5cBh2kEc8R1gL2uPlzy0&e=
See sequence report for GRCh37:
https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.13-5FGRCh37_GCF-5F000001405.13-5FGRCh37-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=2mzBk6ksCERabHcDIy7tR6p1aQvFGkLM8lZNrsWrA18&e=
For some mysterious reason GRCh37 didn't include the mitochondrial chromosome
so the UCSC folks decided to use mitochondrial sequence
NC_001807 and called it chrM.
However, UCSC has recently decided to base hg19 on GRCh37.p13 instead of
GRCh37. A rather surprising move after many years of hg19 being based on the
latter.
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.25_&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=gxOOdwtmHjZfz-EAFblY0cm-7upZ9useI3sEgDD87o8&e=
See sequence report for GRCh37.p13:
https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.25-5FGRCh37.p13_GCF-5F000001405.25-5FGRCh37.p13-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=epUg7bSfwCEF_WUOPlT5hPmLXHY7V51Mau09UaQNB5o&e=
Note that GRCh37.p13 does include the mitochondrial chromosome. It's called MT
in the official sequence report above and chrMT in hg19.
At the same time the UCSC folks decided to keep chrM so now hg19 contains 2
mitochondrial sequences: chrM and chrMT. Previously it has only one: chrM.
So what you see in BioC devel in BSgenome.Hsapiens.UCSC.hg19 and with
seqlevelsStyle(genome) is only reflecting this. In particular
seqlevelsStyle(genome) <- "NCBI" now does the following:
- Rename chrMT -> MT.
- chrM does NOT get renamed. There is no point in renaming this sequence
because it has no equivalent in GRCh37.p13.
Hope this helps,
H.
Leonard
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail
man_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeA
vimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYv
fbojaqTJZVg&s=IczvesjTwEkPQVlFX5wKSJLUHyjNHE0sk71a-kMAVEI&e=
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpa...@fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
_______________________________________________
Bioc-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=g4eW0swjrNpysDJ67do3xLWcLyskjH51X5-x4kMJYDw&e=
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpa...@fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel