Not sure I understand. Setting the seqlevelsStyle cannot change the genome build, so the two Seqinfos will remain incompatible in that way. I think what you want is to be able to say "let's consider these two genome builds to be the same", which seems reasonable after dropping chrM. I was proposing that this could be done with something like:
genome(vcf, rectifySeqlevels=TRUE) <- genome(txdb) For convenience, we might want that argument to be TRUE by default, but that would change current behavior. On Fri, Oct 24, 2014 at 3:41 PM, Robert Castelo <robert.cast...@upf.edu> wrote: > hi Michael, > > if we assume that a seqname style does not imply a specific genome build, > then i'd say that the error below about having incompatible genomes should > not pop up because sequence styles have been already matched, right? > > > > On 10/24/14 10:22 PM, Michael Lawrence wrote: > > I don't think a seqname style implies a specific genome build. But the > inverse might make sense. Given a genome build identifier, we could check > for consistent naming. Perhaps an option on "genome<-" could support this? > > > > On Fri, Oct 24, 2014 at 11:52 AM, Valerie Obenchain <voben...@fhcrc.org> > wrote: > >> This is a good question. I'm not sure we want seqlevelsStyle() to also >> alter the genome value. I think it's a reasonable request but I'd like to >> open it up to discussion. I've cc'd a few others for input. >> >> Valerie >> >> >> >> On 10/24/14 09:05, Robert Castelo wrote: >> >>> hi Valerie, >>> >>> thanks for the quick fix and updating the documentation, i have a >>> further question about the seqinfo slot and particularly the use of >>> seqlevelsStyle(). Let me illustrate it with an example again: >>> >>> >>> ============== >>> library(VariantAnnotation) >>> library(TxDb.Hsapiens.UCSC.hg19.knownGene) >>> >>> ## read again the same VCF file >>> fl <- file.path(system.file("extdata", package="VariantFiltering"), >>> "CEUtrio.vcf.bgz") >>> vcf <- readVcf(fl, seqinfo(scanVcfHeader(fl))) >>> >>> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene >>> >>> ## select the standard chromosomes >>> vcf <- keepStandardChromosomes(vcf) >>> >>> ## since the input VCF file had NCBI style, let's match >>> ## the style of the TxDb annotations >>> seqlevelsStyle(vcf) <- seqlevelsStyle(txdb) >>> >>> ## drop the mitochondrial chromosome (b/c of the different lengths >>> ## between b37 and hg19 >>> vcf <- dropSeqlevels(vcf, "chrM") >>> >>> ## try to annotate the location of the variants. it prompts an >>> ## error because the 'genome' slot of the Seqinfo object still >>> ## has b37 after running seqlevelsStyle >>> vcf_annot <- locateVariants(vcf, txdb, AllVariants()) >>> Error in mergeNamedAtomicVectors(genome(x), genome(y), what = >>> c("sequence", : >>> sequences chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, >>> chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, >>> chr20, chr21, chr22, chrX, chrY, chrM have incompatible genomes: >>> - in 'x': b37, b37, b37, b37, b37, b37, b37, b37, b37, b37, b37, b37, >>> b37, b37, b37, b37, b37, b37, b37, b37, b37, b37, b37, b37, b37 >>> - in 'y': hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, >>> hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, >>> hg19, hg19, hg19 >>> >>> ## this can be fixed by setting the 'genome' slot to the values of >>> ## the TxDb object >>> genome(vcf) <- genome(txdb)[intersect(names(genome(vcf)), >>> names(genome(txdb)))] >>> >>> ## now this works >>> vcf_annot <- locateVariants(vcf, txdb, AllVariants()) >>> ================= >>> >>> so my question is, should not seqlevelsStyle() also change the 'genome' >>> slot of the Seqinfo object in the updated object? >>> >>> if not, would the solution be updating the 'genome' slot in the way i >>> did it? >>> >>> thanks! >>> robert. >>> >>> >>> >>> On 10/23/2014 11:14 PM, Valerie Obenchain wrote: >>> >>>> Hi Robert, >>>> >>>> Thanks for the bug report and reproducible example. Now fixed in release >>>> 1.12.2 and devel 1.13.4. >>>> >>>> I've also updated the docs to better explain how the Seqinfo objects are >>>> propagated / merged when supplied as 'genome'. >>>> >>>> Valerie >>>> >>>> >>>> On 10/23/2014 06:45 AM, Robert Castelo wrote: >>>> >>>>> hi there, >>>>> >>>>> in my package VariantFiltering i have an example VCF file from a Hapmap >>>>> CEU trio including three chromosomes only to illustrate its vignette. >>>>> i've come across a problem with the function readVcf() in >>>>> VariantAnnotation that may be specific of the situation of a VCF file >>>>> not having all chromosomes, but which it will be great for me if this >>>>> could be addressed. >>>>> >>>>> the problem is reproduced as follows: >>>>> >>>>> =========================== >>>>> library(VariantAnnotation) >>>>> >>>>> fl <- file.path(system.file("extdata", package="VariantFiltering"), >>>>> "CEUtrio.vcf.bgz") >>>>> >>>>> vcf <- readVcf(fl, seqinfo(scanVcfHeader(fl))) >>>>> Error in GenomeInfoDb:::makeNewSeqnames(x, new2old = new2old, >>>>> seqlevels(value)) : >>>>> when 'new2old' is NULL, the first elements in the >>>>> supplied 'seqlevels' must be identical to 'seqlevels(x)' >>>>> ==================== >>>>> >>>>> this is caused because although i'm providing the Seqinfo object >>>>> derived >>>>> from the header of the VCF file itself, at some point the ordering of >>>>> the seqlevels between the header and the rest of the VCF file differs >>>>> due to the smaller subset of chromosomes in the VCF file. >>>>> >>>>> This can be easily fixed by replacing the line: >>>>> >>>>> if (length(newsi) > length(oldsi)) { >>>>> >>>>> within the .scanVcfToVCF() function in methods-readVcf.R, by >>>>> >>>>> if (length(newsi) >= length(oldsi)) { >>>>> >>>>> this is happening both in release and devel. i'm pasting below my >>>>> sessionInfo() for the release. >>>>> >>>>> let me know if you think this fix is feasible or i'm wrongly using the >>>>> function readVcf(). i'm basically trying to use readVcf() without >>>>> having >>>>> to figure out the appropriate value for the argument 'genome', i.e., >>>>> without knowing beforehand what version of the genome was used to >>>>> produce the VCF file. >>>>> >>>>> thanks!! >>>>> robert. >>>>> >>>>> >>>>> sessionInfo() >>>>> R version 3.1.1 Patched (2014-10-13 r66751) >>>>> Platform: x86_64-unknown-linux-gnu (64-bit) >>>>> >>>>> locale: >>>>> [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C >>>>> [3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8 >>>>> [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8 >>>>> [7] LC_PAPER=en_US.UTF8 LC_NAME=C >>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>> [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C >>>>> >>>>> attached base packages: >>>>> [1] stats4 parallel stats graphics grDevices >>>>> [6] utils datasets methods base >>>>> >>>>> other attached packages: >>>>> [1] VariantAnnotation_1.12.0 Rsamtools_1.18.0 >>>>> [3] Biostrings_2.34.0 XVector_0.6.0 >>>>> [5] GenomicRanges_1.18.0 GenomeInfoDb_1.2.0 >>>>> [7] IRanges_2.0.0 S4Vectors_0.4.0 >>>>> [9] BiocGenerics_0.12.0 vimcom_1.0-0 >>>>> [11] setwidth_1.0-3 colorout_1.0-3 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] AnnotationDbi_1.28.0 base64enc_0.1-2 >>>>> [3] BatchJobs_1.4 BBmisc_1.7 >>>>> [5] Biobase_2.26.0 BiocParallel_1.0.0 >>>>> [7] biomaRt_2.22.0 bitops_1.0-6 >>>>> [9] brew_1.0-6 BSgenome_1.34.0 >>>>> [11] checkmate_1.4 codetools_0.2-9 >>>>> [13] DBI_0.3.1 digest_0.6.4 >>>>> [15] fail_1.2 foreach_1.4.2 >>>>> [17] GenomicAlignments_1.2.0 GenomicFeatures_1.18.0 >>>>> [19] iterators_1.0.7 RCurl_1.95-4.3 >>>>> [21] RSQLite_0.11.4 rtracklayer_1.26.0 >>>>> [23] sendmailR_1.2-1 stringr_0.6.2 >>>>> [25] tools_3.1.1 XML_3.98-1.1 >>>>> [27] zlibbioc_1.12.0 >>>>> >>>>> _______________________________________________ >>>>> Bioc-devel@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>> >>>> >>>> >>>> >>> >> > > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel