[Bioc-devel] VariantAnnotation::readVcf(fl, seqinfo(scanVcfHeader(fl)) problem

Robert Castelo Thu, 23 Oct 2014 06:47:42 -0700

hi there,

in my package VariantFiltering i have an example VCF file from a HapmapCEU trio including three chromosomes only to illustrate its vignette.i've come across a problem with the function readVcf() inVariantAnnotation that may be specific of the situation of a VCF filenot having all chromosomes, but which it will be great for me if thiscould be addressed.


the problem is reproduced as follows:

===========================
library(VariantAnnotation)

fl <- file.path(system.file("extdata", package="VariantFiltering"),"CEUtrio.vcf.bgz")


vcf <- readVcf(fl, seqinfo(scanVcfHeader(fl)))

Error in GenomeInfoDb:::makeNewSeqnames(x, new2old = new2old,seqlevels(value)) :

  when 'new2old' is NULL, the first elements in the
  supplied 'seqlevels' must be identical to 'seqlevels(x)'
====================

this is caused because although i'm providing the Seqinfo object derivedfrom the header of the VCF file itself, at some point the ordering ofthe seqlevels between the header and the rest of the VCF file differsdue to the smaller subset of chromosomes in the VCF file.


This can be easily fixed by replacing the line:

                if (length(newsi) > length(oldsi)) {

within the .scanVcfToVCF() function in methods-readVcf.R, by

                if (length(newsi) >= length(oldsi)) {

this is happening both in release and devel. i'm pasting below mysessionInfo() for the release.

let me know if you think this fix is feasible or i'm wrongly using thefunction readVcf(). i'm basically trying to use readVcf() without havingto figure out the appropriate value for the argument 'genome', i.e.,without knowing beforehand what version of the genome was used toproduce the VCF file.


thanks!!
robert.


sessionInfo()
R version 3.1.1 Patched (2014-10-13 r66751)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF8        LC_COLLATE=en_US.UTF8
 [5] LC_MONETARY=en_US.UTF8    LC_MESSAGES=en_US.UTF8
 [7] LC_PAPER=en_US.UTF8       LC_NAME=C
 [9] LC_ADDRESS=C              LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices
[6] utils     datasets  methods   base

other attached packages:
 [1] VariantAnnotation_1.12.0 Rsamtools_1.18.0
 [3] Biostrings_2.34.0        XVector_0.6.0
 [5] GenomicRanges_1.18.0     GenomeInfoDb_1.2.0
 [7] IRanges_2.0.0            S4Vectors_0.4.0
 [9] BiocGenerics_0.12.0      vimcom_1.0-0
[11] setwidth_1.0-3           colorout_1.0-3

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.28.0    base64enc_0.1-2
 [3] BatchJobs_1.4           BBmisc_1.7
 [5] Biobase_2.26.0          BiocParallel_1.0.0
 [7] biomaRt_2.22.0          bitops_1.0-6
 [9] brew_1.0-6              BSgenome_1.34.0
[11] checkmate_1.4           codetools_0.2-9
[13] DBI_0.3.1               digest_0.6.4
[15] fail_1.2                foreach_1.4.2
[17] GenomicAlignments_1.2.0 GenomicFeatures_1.18.0
[19] iterators_1.0.7         RCurl_1.95-4.3
[21] RSQLite_0.11.4          rtracklayer_1.26.0
[23] sendmailR_1.2-1         stringr_0.6.2
[25] tools_3.1.1             XML_3.98-1.1
[27] zlibbioc_1.12.0

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] VariantAnnotation::readVcf(fl, seqinfo(scanVcfHeader(fl)) problem

Reply via email to