Hi Robert,

The VCF file uses "22" for the chromosome name which is the name used by NCBI. So explicitly specifying "hg19" in the readVcf() call is like saying that this chromosome name is a UCSC name which is why seqlevelsStyle() gets confused later.

If you specify the name of the NCBI assembly, things work as expected:

  fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
  vcf <- readVcf(fl, "GRCh37")
  seqlevels(vcf)
  # [1] "22"
  seqlevelsStyle(vcf)
  # [1] "NCBI"
  seqlevelsStyle(vcf) <- "UCSC"
  seqlevels(vcf)
  # [1] "chr22"

Or, if you don't know what reference genome the file is based on, don't specify it:

  fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
  vcf <- readVcf(fl)
  seqlevels(vcf)
  # [1] "22"
  seqlevelsStyle(vcf)
  # [1] "NCBI"    "Ensembl"
  seqlevelsStyle(vcf) <- "UCSC"
  seqlevels(vcf)
  # [1] "chr22"

or specify it later:

  genome(vcf) <- "hg19"
  seqinfo(vcf)
  # Seqinfo object with 1 sequence from hg19 genome; no seqlengths:
  #   seqnames seqlengths isCircular genome
  #   chr22            NA         NA   hg19

Hope this helps,
H.


On 7/29/20 08:30, Robert Castelo wrote:
hi,

it looks like either VariantAnnotation::readVcf() or something in the CollapsedVCF class broke in devel with respect to reading and setting sequence styles:

library(VariantAnnotation)

fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
vcf <- readVcf(fl, "hg19")
seqlevels(vcf)
[1] "22"
seqlevelsStyle(vcf)
[1] "UCSC"
seqlevelsStyle(vcf) <- "UCSC"
seqlevels(vcf)
[1] "22"

you can find my session information below. let me know if you want me to open an issue at the GitHub repo (VariantAnnotatoin or GenomeInfoDb?).

thanks!

robert.

BiocManager::version()
[1] ‘3.12’
sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
  [1] VariantAnnotation_1.35.3 Rsamtools_2.5.3
  [3] Biostrings_2.57.2 XVector_0.29.3
  [5] SummarizedExperiment_1.19.6 DelayedArray_0.15.7
  [7] matrixStats_0.56.0 Matrix_1.2-18
  [9] Biobase_2.49.0 GenomicRanges_1.41.5
[11] GenomeInfoDb_1.25.8 IRanges_2.23.10
[13] S4Vectors_0.27.12 BiocGenerics_0.35.4
[15] BiocManager_1.30.10

loaded via a namespace (and not attached):
  [1] progress_1.2.2           tidyselect_1.1.0 purrr_0.3.4
  [4] lattice_0.20-41          vctrs_0.3.1 generics_0.0.2
  [7] BiocFileCache_1.13.0     rtracklayer_1.49.4 GenomicFeatures_1.41.2
[10] blob_1.2.1               XML_3.99-0.4 rlang_0.4.6
[13] pillar_1.4.4             glue_1.4.1 DBI_1.1.0
[16] rappdirs_0.3.1           BiocParallel_1.23.2 bit64_0.9-7.1
[19] dbplyr_1.4.4             GenomeInfoDbData_1.2.3 lifecycle_0.2.0
[22] stringr_1.4.0            zlibbioc_1.35.0 memoise_1.1.0
[25] biomaRt_2.45.2           curl_4.3 AnnotationDbi_1.51.3
[28] Rcpp_1.0.4.6             BSgenome_1.57.5 openssl_1.4.1
[31] bit_1.1-15.2             hms_0.5.3 askpass_1.1
[34] digest_0.6.25            stringi_1.4.6 dplyr_1.0.0
[37] grid_4.0.0               tools_4.0.0 bitops_1.0-6
[40] magrittr_1.5             RCurl_1.98-1.2 RSQLite_2.2.0
[43] tibble_3.0.1             crayon_1.3.4 pkgconfig_2.0.3
[46] ellipsis_0.3.1           prettyunits_1.1.1 assertthat_0.2.1
[49] httr_1.4.1               R6_2.4.1 GenomicAlignments_1.25.3
[52] compiler_4.0.0

_______________________________________________
Bioc-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=gp0KKC6W1uS1YnyFI5iSuxF5WSUpOhbHwL94GRP8yu0&s=Co1P5SErF64uPYhHMltM3De48hQLl-XHK3gfZOEnSKc&e=

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to