If the chromosome name depends on the assembly, that makes GenomeInfoDb
even more useful and necessary.  Provided it is supported of course.

On Fri, Dec 13, 2019 at 11:45 AM Vincent Carey <st...@channing.harvard.edu>
wrote:

> I tried an inline png but I think it was rejected by bioc-devel.  Here's
> another try.
>
> On Fri, Dec 13, 2019 at 11:40 AM Vincent Carey <st...@channing.harvard.edu
> >
> wrote:
>
> > Thanks -- It is good to know more about the complications of adding
> > seqlevelsStyle elements.
> > I am not sure how pervasive this will be in SNP annotation in the future.
> > The "new API" for dbSNP
> > references SPDI annotation conventions.
> >
> > https://api.ncbi.nlm.nih.gov/variation/v0/
> >
> > at least one dbsnp build 152 resource uses this nomenclature.  The one
> >
> > referenced below is the "go-to" resource for current rsid-coordinate
> >
> > correspondence, as far as I know.
> >
> >
> > > library(VariantAnnotation)
> >
> > *0/0 packages newly attached/loaded, see sessionInfo() for details.*
> >
> > > mypar = GRanges("NC_000001.11", IRanges(100000,120000)) # note seqnames
> >
> >
> > > nn = readVcf("
> >
> ftp://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.38.gz
> > ",
> >
> > +   genome="GRCh38", param=mypar)
> >
> >
> > > head(rowRanges(nn), 3)
> >
> > GRanges object with 3 ranges and 5 metadata columns:
> >
> >                    seqnames    ranges strand | paramRangeID
> REF
> >
> >                       <Rle> <IRanges>  <Rle> |     <factor>
> <DNAStringSet>
> >
> >   rs1331956057 NC_000001.11    100000      * |         <NA>
> C
> >
> >   rs1252351580 NC_000001.11    100036      * |         <NA>
> T
> >
> >   rs1238523913 NC_000001.11    100051      * |         <NA>
> T
> >
> >                               ALT      QUAL      FILTER
> >
> >                <DNAStringSetList> <numeric> <character>
> >
> >   rs1331956057                  T      <NA>           .
> >
> >   rs1252351580                  G      <NA>           .
> >
> >   rs1238523913                  C      <NA>           .
> >
> >   -------
> >
> >   seqinfo: 1 sequence from GRCh38 genome; no seqlengths
> >
> >
> > On Fri, Dec 13, 2019 at 11:01 AM Robert Castelo <robert.cast...@upf.edu>
> > wrote:
> >
> >> hi Hervé,
> >>
> >> i didn't know about this new sequence style until Vince posted his
> >> message and we briefly talked about it at the European BioC meeting this
> >> week in Brussels. however, i didn't know that the style was specific to
> >> a particular assembly. i have no use case of this at the mome moment,
> >> i.e., i have not encountered myself any annotation or BAM file with
> >> chromosome names written that way, so i don't know how pressing this
> >> issue is, maybe Vince can tell us how spread such chromosome naming
> >> style may become in the near future.
> >>
> >> naively, i'd think that it would be matter of adding a
> >> reference-specific column, i.e., 'GRCh38.p13', 'GRCh37.p13', etc., but i
> >> can imagine that maybe the "reference style" concept might not be the
> >> appropriate placeholder to map all different chromosome names of all
> >> different individual human genomes uploaded to NCBI. maybe we should
> >> wait until we have a specific use case .. Vince?
> >>
> >> robert.
> >>
> >> On 12/11/19 10:06 PM, Pages, Herve wrote:
> >> > Hi Vince, Robert,
> >> >
> >> > Looks like Vince wants the RefSeq accession e.g. NC_000017.11 for
> chrom
> >> > 17 in the GRCh38.
> >> >
> >> > @Robert: Is this what you're also interested in?
> >> >
> >> > The problem is that the RefSeq accessions are specific to a particular
> >> > assembly (e.g. NC_000017.11 for chrom 17 in GRCh38 but NC_000017.10
> for
> >> > the same chrom in GRCh37).
> >> >
> >> > Currently seqlevelsStyle() doesn't know how to distinguish between
> >> > different assemblies of the same organism. Not saying it couldn't but
> it
> >> > would require some thinking and some significant refactoring. It
> >> > wouldn't be just a matter of adding a column to
> >> > genomeStyles()$Homo_sapiens.
> >> >
> >> > H.
> >> >
> >> >
> >> > On 12/10/19 14:19, Robert Castelo wrote:
> >> >> I second this, and would suggest to name the style as 'GRC' for
> "Genome
> >> >> Reference Consortium".
> >> >>
> >> >> thanks Vince for bringing this up, being able to easily switch
> between
> >> >> genome styles is great.
> >> >>
> >> >> if 'paste0()' in R is one of the most influential contributions to
> >> >> statistical computing
> >> >>
> >> >>
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__simplystatistics.org_2013_01_31_paste0-2Dis-2Dstatistical-2Dcomputings-2Dmost-2Dinfluential-2Dcontribution-2Dof-2Dthe-2D21st-2Dcentury&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=b0_SIu8orJ7ZcCS3TIodFvGTPibt9R8vFL5Y40YSx3Q&e=
> >> >>
> >> >> i think that 'seqlevelsStyle()' from the GenomeInfoDb package is one
> of
> >> >> the most influential contributions to human genetics, if you think
> >> about
> >> >> the time invested by researchers in parsing and changing between
> >> >> different styles of chromosome names :)
> >> >>
> >> >> robert.
> >> >>
> >> >> On 06/12/2019 15:03, Vincent Carey wrote:
> >> >>> I raised this issue previously with little response.
> >> >>>
> >> >>> I'd propose that we add a column or two to
> genomeStyles()$Homo_sapiens
> >> >>>
> >> >>>> head(genomeStyles()$Homo_sapiens, 2)
> >> >>>     circular auto   sex NCBI UCSC dbSNP Ensembl
> >> >>>
> >> >>> 1    FALSE TRUE FALSE    1 chr1   ch1       1
> >> >>>
> >> >>> 2    FALSE TRUE FALSE    2 chr2   ch2       2
> >> >>>
> >> >>>
> >> >>> that includes the values for "NCBI reference sequence names"
> >> >>>
> >> >>> See
> >> >>>
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_nuccore_568815581&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=3Jy-MH7heIcrc_A4qm_izduLvBoPWHSeq4gdxf5nv24&e=
> >> >>> for one report on chr17,
> >> >>> and
> >> >>>
> >> >>>
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.39&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=y6ut_Xcc4rSbXanckiJhiwLsL0W8neJfKWQa6wnG3aM&e=
> >> >>>
> >> >>> for a table that includes the Genbank labels.
> >> >>>
> >> >>> Should I just file a PR at
> >> >>>
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_GenomeInfoDb_&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=KMzfo3_8kkJ-wdvRCNP5rUjTVMW87brj07yHaKL5Qb0&e=
> >> >>> after
> >> >>> testing?
> >> >>>
> >> >>
> >> >> _______________________________________________
> >> >> Bioc-devel@r-project.org mailing list
> >> >>
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=SvtNreKVOHnSGjsRwzWWpttpEF7wBXI5utI37-qgX1A&e=
> >> >>
> >> >
> >>
> >> --
> >> Robert Castelo, PhD
> >> Associate Professor
> >> Dept. of Experimental and Health Sciences
> >> Universitat Pompeu Fabra (UPF)
> >> Barcelona Biomedical Research Park (PRBB)
> >> Dr Aiguader 88
> >> E-08003 Barcelona, Spain
> >> telf: +34.933.160.514
> >> fax: +34.933.160.550
> >>
> >
>
> --
> The information in this e-mail is intended only for th...{{dropped:15}}

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to