Kasper,

The tradition so far has been to package all UCSC human genomes since hg17. We could also start producing BSgenome packages for other non-UCSC Human assemblies. We just need to draw a line somewhere. If there is a need for it we can make BSgenome.Hsapiens.NCBI.GRCh37.p13 available, as I said earlier. Is this what you are asking for?

H.

On 8/20/20 03:23, Kasper Daniel Hansen wrote:
Well, the presence of two mitochondrial genomes is to fix a mistake by UCSC. I can appreciate the importance of representing this mistake when you build off UCSC. But it strikes me as not actually representing the h37 version of the genome, and it seems to me that we want such a representation in the project - not everything comes through UCSC. But perhaps I have not given this sufficient thought, this is just my immediate reaction.

On Tue, Aug 18, 2020 at 8:18 PM Leonard Goldstein <goldstein.leon...@gene.com <mailto:goldstein.leon...@gene.com>> wrote:

    Thanks for the explanation Hervé.

    Best wishes

    Leonard


    On Tue, Aug 18, 2020 at 10:06 AM Hervé Pagès <hpa...@fredhutch.org
    <mailto:hpa...@fredhutch.org>> wrote:

        On 8/18/20 01:40, Kasper Daniel Hansen wrote:
         > In light of this, could we get a version of GRCh37 with only
        a single
         > mitochondrial genome?

        You mean a BSgenome.Hsapiens.NCBI.GRCh37.p13 package? So it would
        contain the same sequences as BSgenome.Hsapiens.UCSC.hg19 but
        without
        the hg19:chrM sequence?

        Certainly doable but note that by using
        BSgenome.Hsapiens.UCSC.hg38 you
        stay away from this mess. I'm not sure that adding yet another
        BSgenome
        package would make the situation less confusing.

         >
         > On Fri, Aug 14, 2020 at 6:17 PM Hervé Pagès
        <hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>
         > <mailto:hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>>>
        wrote:
         >
         >     Hi Felix,
         >
         >     On 8/13/20 21:43, Felix Ernst wrote:
         >      > Hi Leonard, Hi Herve,
         >      >
         >      > I followed your conversation, since I have noticed the
        same
         >     problem. Thanks, Herve, for the explanation of the recent
        changes on
         >     hg19.
         >      >
         >      > The GRCh37.P13 report states in its last line:
         >      >
>      > MT    assembled-molecule      MT      Mitochondrion  J01415.2
         >          =       NC_012920.1     non-nuclear     16569   chrM
         >      >
         >      > Since the last name is called "UCSC-style-name",
        wouldn't that
         >     mean that chrM has to be renamed to MT and not chrMT?
         >
         >     This is a mistake in the sequence report for GRCh37.p13.
        GRCh37.p13:MT
         >     is the same as hg19:chrMT, not hg19:chrM.
         >
         >     hg19:chrM and hg19:chrMT are **not** the same sequences.
        The former is
         >     NC_001807 and has length 16571 and the latter is
        NC_012920.1 and has
         >     length 16569.
         >
         >     Yes, seqlevelsStyle() is sorting out all this mess for
        you ;-)
         >
         >     Cheers,
         >     H.
         >
         >      >
         >      > Thanks again for the explanation.
         >      >
         >      > Cheers,
         >      > Felix
         >      >
         >      > -----Ursprüngliche Nachricht-----
         >      > Von: Bioc-devel <bioc-devel-boun...@r-project.org
        <mailto:bioc-devel-boun...@r-project.org>
         >     <mailto:bioc-devel-boun...@r-project.org
        <mailto:bioc-devel-boun...@r-project.org>>> Im Auftrag von Hervé
        Pagès
         >      > Gesendet: Freitag, 14. August 2020 01:08
         >      > An: Leonard Goldstein <goldstein.leon...@gene.com
        <mailto:goldstein.leon...@gene.com>
         >     <mailto:goldstein.leon...@gene.com
        <mailto:goldstein.leon...@gene.com>>>; bioc-devel@r-project.org
        <mailto:bioc-devel@r-project.org>
         >     <mailto:bioc-devel@r-project.org
        <mailto:bioc-devel@r-project.org>>
         >      > Cc: charlotte.sone...@fmi.ch
        <mailto:charlotte.sone...@fmi.ch>
        <mailto:charlotte.sone...@fmi.ch <mailto:charlotte.sone...@fmi.ch>>
         >      > Betreff: Re: [Bioc-devel] BSgenome changes
         >      >
         >      > Hi Leonard,
         >      >
         >      > On 8/12/20 15:22, Leonard Goldstein via Bioc-devel wrote:
         >      >> Dear Bioc team,
         >      >>
         >      >> I'm following up on this recent GitHub issue
         >      >>
>  <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ldg21
         >      >>
>  _SGSeq_issues_5&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYvfbojaqTJZVg&s=Tfk-tDM99P63dnsvMydG2phv5WQPVbJzPk0hzi-_1SE&e=
         >      >. Please see the issue for more details and code examples.
         >      >>
         >      >> It looks like changes in Bioc devel result in two
        copies of the
         >      >> mitochondrial chromosome for
        BSgenome.Hsapiens.UCSC.hg19 -- one
         >     named
         >      >> chrM like in previous package versions (length 16571)
        and one named
         >      >> chrMT (length 16569).
         >      >>
         >      >> When using seqlevelsStyle() to change chromosome
        names from UCSC to
         >      >> NCBI format, this results in new behavior -- in the
        past chrM was
         >      >> simply renamed MT, now the different sequence chrMT
        is used. Is
         >     this intended?
         >      >
         >      > Absolutely intended.
         >      >
         >      > There is a long story behind the unfortunate fate of the
         >     mitochondrial chromosome in hg19. I'll try to keep it short.
         >      >
         >      > When the UCSC folks released the hg19 browser more
        than 10 years
         >     ago, they based it on assembly GRCh37:
         >      >
         >      >
         >
        
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.13&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=jWtgKVQGC-SQp6i4prhKBiD5cBh2kEc8R1gL2uPlzy0&e=
         >      >
         >      > See sequence report for GRCh37:
         >      >
         >      >
         >      >
         >
        
https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.13-5FGRCh37_GCF-5F000001405.13-5FGRCh37-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=2mzBk6ksCERabHcDIy7tR6p1aQvFGkLM8lZNrsWrA18&e=
         >      >
         >      > For some mysterious reason GRCh37 didn't include the
         >     mitochondrial chromosome so the UCSC folks decided to use
         >     mitochondrial sequence
         >      > NC_001807 and called it chrM.
         >      >
         >      > However, UCSC has recently decided to base hg19 on
        GRCh37.p13
         >     instead of GRCh37. A rather surprising move after many
        years of hg19
         >     being based on the latter.
         >      >
         >      >
         >
        
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.25_&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=gxOOdwtmHjZfz-EAFblY0cm-7upZ9useI3sEgDD87o8&e=
         >      >
         >      > See sequence report for GRCh37.p13:
         >      >
         >      >
         >      >
         >
        
https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.25-5FGRCh37.p13_GCF-5F000001405.25-5FGRCh37.p13-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=epUg7bSfwCEF_WUOPlT5hPmLXHY7V51Mau09UaQNB5o&e=
         >      >
         >      > Note that GRCh37.p13 does include the mitochondrial
        chromosome.
         >     It's called MT in the official sequence report above and
        chrMT in hg19.
         >      >
         >      > At the same time the UCSC folks decided to keep chrM
        so now hg19
         >     contains 2 mitochondrial sequences: chrM and chrMT.
        Previously it
         >     has only one: chrM.
         >      >
         >      > So what you see in BioC devel in
        BSgenome.Hsapiens.UCSC.hg19 and with
         >      > seqlevelsStyle(genome) is only reflecting this. In
        particular
         >      > seqlevelsStyle(genome) <- "NCBI" now does the following:
         >      >
         >      >     - Rename chrMT -> MT.
         >      >
         >      >     - chrM does NOT get renamed. There is no point in
        renaming
         >     this sequence because it has no equivalent in GRCh37.p13.
         >      >
         >      > Hope this helps,
         >      >
         >      > H.
         >      >
         >      >>
         >      >> Leonard
         >      >>
         >      >>      [[alternative HTML version deleted]]
         >      >>
         >      >> _______________________________________________
         >      >> Bioc-devel@r-project.org
        <mailto:Bioc-devel@r-project.org>
        <mailto:Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>>
         >     mailing list
         >      >>
         >
        https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail
         >      >>
>  man_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeA
         >      >>
>  vimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYv
         >      >>
        fbojaqTJZVg&s=IczvesjTwEkPQVlFX5wKSJLUHyjNHE0sk71a-kMAVEI&e=
         >      >>
         >      >
         >      > --
         >      > Hervé Pagès
         >      >
         >      > Program in Computational Biology
         >      > Division of Public Health Sciences
         >      > Fred Hutchinson Cancer Research Center
         >      > 1100 Fairview Ave. N, M1-B514
         >      > P.O. Box 19024
         >      > Seattle, WA 98109-1024
         >      >
         >      > E-mail: hpa...@fredhutch.org
        <mailto:hpa...@fredhutch.org> <mailto:hpa...@fredhutch.org
        <mailto:hpa...@fredhutch.org>>
         >      > Phone:  (206) 667-5791
         >      > Fax:    (206) 667-1319
         >      >
         >      > _______________________________________________
         >      > Bioc-devel@r-project.org
        <mailto:Bioc-devel@r-project.org>
        <mailto:Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>>
         >     mailing list
         >      >
         >
        
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=g4eW0swjrNpysDJ67do3xLWcLyskjH51X5-x4kMJYDw&e=
         >      >
         >
         >     --
         >     Hervé Pagès
         >
         >     Program in Computational Biology
         >     Division of Public Health Sciences
         >     Fred Hutchinson Cancer Research Center
         >     1100 Fairview Ave. N, M1-B514
         >     P.O. Box 19024
         >     Seattle, WA 98109-1024
         >
         >     E-mail: hpa...@fredhutch.org
        <mailto:hpa...@fredhutch.org> <mailto:hpa...@fredhutch.org
        <mailto:hpa...@fredhutch.org>>
         >     Phone:  (206) 667-5791
         >     Fax:    (206) 667-1319
         >
         >     _______________________________________________
         > Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
        <mailto:Bioc-devel@r-project.org
        <mailto:Bioc-devel@r-project.org>> mailing list
         > https://stat.ethz.ch/mailman/listinfo/bioc-devel
        
<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ZEkK79ISNzkyVJe1VIHawt4Y06TaycYht6rtTE_1eAE&s=MPZsoxMTYGldvJB8QHrLQL-3j8-p1RCWFUZmUsfHlbk&e=>
>  <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=5BrpbmuLSg2cS13gst2oJ-M8PG3kijaxWs3dZkYY8yw&s=NvAaJQhMJpXLBRTOJp4WG11FR4tuCXJ8cfgCdMlv5OY&e=>
         >
         >
         >
         > --
         > Best,
         > Kasper

-- Hervé Pagès

        Program in Computational Biology
        Division of Public Health Sciences
        Fred Hutchinson Cancer Research Center
        1100 Fairview Ave. N, M1-B514
        P.O. Box 19024
        Seattle, WA 98109-1024

        E-mail: hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>
        Phone:  (206) 667-5791
        Fax:    (206) 667-1319



--
Best,
Kasper

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to