Files with a .vcf extension often get scrubbed from email because they are interpreted as a 'vCard' file. (At least this has been my experience.) Changing the file extension to something other than '.vcf' usually solves the problem.

I was able to reproduce the error with the file you pasted in the message. The bug was some old code looking for ":" in rownames. This was a legacy check and is no longer necessary (I should have removed it some time ago). Now fixed in release (1.8.7) and devel (1.19.16).

Thanks for persevering and reporting this bug.

Valerie


On 11/28/2013 02:13 AM, Becq, Jennifer wrote:
Hi Valerie,
The VCF that is causing the problem was at the bottom of my email, I can 
copy-paste it here again:

##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr20   14855644        DEL:561590:0:1:0:0:0    C       <DEL>     .       PASS  
  .
chr20   29627290        BND:81424:0:1:1:1       G       [chr2:114173319[G       
.       MaxDepth        .
chr20   35365307        BND:54200:0:1:0:1       T       ]chr1:230941520]T       
.       PASS    .
chr20   60520225        DEL:572151:1:1:6:4:0    
AACGATGAGGAGCATCGCGGCTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCACGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACTGTGGCCGCCCCTTCTCACCG
 A       .
        PASS    .
chr20   60520443        DEL:572151:1:1:6:4:1    
GACTGTCTGCACCGTGGCCGCCCCTTCTCACTGACGATGAGGAGCACTGCGACTGTCTGCACCGTGGCCGCCCTTTCTGACTGATGATAAGGAACATTGCGACTGTCTGCACCGTGGCTGCCCCTTCTCACCAACGCTGAGGAGCACTGCAACCATCTGCA
CCGTGGCCGCCCCTTCTCACCGATGATGAGGAACATTGAGACTGTCTGCCCCGTGGCTGCCCCTTCTCACCGATGCTGAGGAGCACTGTGACTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCGCGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACCGTGGCCG
CCCCTTCTCACCGATGACGAGGAGCACTGCGA        GC      .       PASS    .
chr20   60520937        DEL:572151:1:1:11:0:0   C       <DEL>     .       PASS  
  .
chr20   61766068        DEL:572433:0:0:5:2:0    
CAGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCACAGGGGAGGCAGGGCCCAGAGAGGAGGCGGGGCCACAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCA
C       CGG     .       PASS    .
chr20   62063686        DUP:TANDEM:572544:0:0:8:0:0     T       <DUP:TANDEM>    
  .       PASS    .





Thanks
Jennifer

Jennifer Becq
Senior Bioinformatics Scientist
Illumina Cambridge Ltd
Tel: +44 (0) 1799 532300
email: jb...@illumina.com



-----Original Message-----
From: Valerie Obenchain [mailto:voben...@fhcrc.org]
Sent: 27 November 2013 21:17
To: Becq, Jennifer; bioc-devel@r-project.org
Subject: Re: VariantAnnotation writeVcf problem

Hi,

I can't reproduce this error. Here is a read/write example using a file from 
VariantAnnotation where the results are as expected.

fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") dest <- 
tempfile()
vcf1 <- readVcf(fl, "hg19")
  > rownames(vcf1)
[1] "rs6054257"      "20:17330_T/A"   "rs6040355"      "20:1230237_T/."
[5] "microsat1"

writeVcf(vcf1, dest)
vcf2 <- readVcf(dest, "hg19")
  > rownames(vcf2)
[1] "rs6054257"      "20:17330_T/A"   "rs6040355"      "20:1230237_T/."
[5] "microsat1"

I need a reproducible example in order to help. Is the vcf you're working with 
publicly available?

Valerie

On 11/27/2013 03:37 AM, Becq, Jennifer wrote:
Hi Valerie,

Thank you for cc'ing my message.

The "ID" values are removed when reading a VCF through readVcf() and re-writing 
it with writeVcf():

V = readVcf("test.vcf", "hg19")
rownames(V)
[1] "DEL:561590:0:1:0:0:0"        "BND:81424:0:1:1:1"
[3] "BND:54200:0:1:0:1"           "DEL:572151:1:1:6:4:0"
[5] "DEL:572151:1:1:6:4:1"        "DEL:572151:1:1:11:0:0"
[7] "DEL:572433:0:0:5:2:0"        "DUP:TANDEM:572544:0:0:8:0:0"
writeVcf(V, "writeTest.vcf")
V2 = readVcf("writeTest.vcf", "hg19")
rownames(V2)
[1] "chr20:14855644_C/<DEL>"
[2] "chr20:29627290_G/[chr2:114173319[G"
[3] "chr20:35365307_T/]chr1:230941520]T"
[4] 
"chr20:60520225_AACGATGAGGAGCATCGCGGCTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCACGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACTGTGGCCGCCCCTTCTCACCG/A"
[5] 
"chr20:60520443_GACTGTCTGCACCGTGGCCGCCCCTTCTCACTGACGATGAGGAGCACTGCGACTGTCTGCACCGTGGCCGCCCTTTCTGACTGATGATAAGGAACATTGCGACTGTCTGCACCGTGGCTGCCCCTTCTCACCAACGCTGAGGAGCACTGCAACCATCTGCACCGTGGCCGCCCCTTCTCACCGATGATGAGGAACATTGAGACTGTCTGCCCCGTGGCTGCCCCTTCTCACCGATGCTGAGGAGCACTGTGACTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCGCGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACCGTGGCCGCCCCTTCTCACCGATGACGAGGAGCACTGCGA/GC"
[6] "chr20:60520937_C/<DEL>"
[7] 
"chr20:61766068_CAGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCACAGGGGAGGCAGGGCCCAGAGAGGAGGCGGGGCCACAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCAC/CGG"
[8] "chr20:62063686_T/<DUP:TANDEM>"

sessionInfo()
R version 3.0.2 Patched (2013-10-27 r64116)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
   [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] VariantAnnotation_1.8.6 Rsamtools_1.14.2        Biostrings_2.30.1
[4] GenomicRanges_1.14.3    XVector_0.2.0           IRanges_1.20.5
[7] BiocGenerics_0.8.0

loaded via a namespace (and not attached):
   [1] AnnotationDbi_1.24.0   Biobase_2.22.0         biomaRt_2.18.0
   [4] bitops_1.0-6           BSgenome_1.30.0        DBI_0.2-7
   [7] GenomicFeatures_1.14.2 RCurl_1.95-4.1         RSQLite_0.11.4
[10] rtracklayer_1.22.0     stats4_3.0.2           tools_3.0.2
[13] XML_3.98-1.1           zlibbioc_1.8.0


*****  With the following VCF test.vcf:

##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr20   14855644        DEL:561590:0:1:0:0:0    C       <DEL>     .       PASS  
  .
chr20   29627290        BND:81424:0:1:1:1       G       [chr2:114173319[G       
.       MaxDepth        .
chr20   35365307        BND:54200:0:1:0:1       T       ]chr1:230941520]T       
.       PASS    .
chr20   60520225        DEL:572151:1:1:6:4:0    
AACGATGAGGAGCATCGCGGCTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCACGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACTGTGGCCGCCCCTTCTCACCG
 A       
.       PASS    .
chr20   60520443        DEL:572151:1:1:6:4:1    
GACTGTCTGCACCGTGGCCGCCCCTTCTCACTGACGATGAGGAGCACTGCGACTGTCTGCACCGTGGCCGCCCTTTCTGACTGATGATAAGGAACATTGCGACTGTCTGCACCGTGGCTGCCCCTTCTCACCAACGCTGAGGAGCACTGCAACCATCTGC
ACCGTGGCCGCCCCTTCTCACCGATGATGAGGAACATTGAGACTGTCTGCCCCGTGGCTGCCCCTTCTCACCGATGCTGAGGAGCACTGTGACTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCGCGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACCGTGGC
CGCCCCTTCTCACCGATGACGAGGAGCACTGCGA      GC      .       PASS    .
chr20   60520937        DEL:572151:1:1:11:0:0   C       <DEL>     .       PASS  
  .
chr20   61766068        DEL:572433:0:0:5:2:0    
CAGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCACAGGGGAGGCAGGGCCCAGAGAGGAGGCGGGGCCACAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCC
AC      CGG     .       PASS    .
chr20   62063686        DUP:TANDEM:572544:0:0:8:0:0     T       <DUP:TANDEM>    
  .       PASS    .

Thanks
Jennifer


Jennifer Becq
Bioinformatics Scientist
Illumina Cambridge Ltd
Tel: +44 (0) 1799 532300
email: jb...@illumina.com



-----Original Message-----
From: Valerie Obenchain [mailto:voben...@fhcrc.org]
Sent: 20 November 2013 17:28
To: Becq, Jennifer; bioc-devel@r-project.org
Subject: Re: VariantAnnotation writeVcf problem

Hi Jennifer,

I've cc'd your message to the Bioconductor mailing list. We have two
lists, one for general questions and the other for bug reports/feature
requests. Please post future questions to one of these lists instead
of sending them to a single person. The lists reach a wider audience
and others can chime in with their responses/experience. You can find
info about the mailing lists here,

http://www.bioconductor.org/help/mailing-list/

writeVcf() should only write out '.' for ID if the ID is missing. There is no 
restriction on the format of the ID. Can you provide a small sample of the vcf 
file you're having trouble with (just a few lines is enough)? Also include the 
output of your sessionInfo().

Valerie


On 11/15/2013 08:56 AM, Becq, Jennifer wrote:
Hi Valerie,

I've been using VariantAnnotation for quite a while now and it's been great!

However I've just encountered a problem:

If I read in a VCF and re-write it directly, the ID column has
disappeared and becomes "." instead of the original
"DEL:9586:0:1:0:0:0", even though the rownames of my VCF object are
correctly populated with the original ID column.

   > library(VariantAnnotation)

   > in1 = readVcf("my.vcf.gz", "hg19")

   > writeVcf(in1, "test.vcf")

I was wondering if that was because ID only accepts a specific format
(rsID or chr:pos)?

Thank you for your help

Jennifer

*Jennifer Becq*

*Bioinformatics Scientist*

*Illumina Cambridge Ltd*

Tel: +44 (0) 1799 532300

email: jb...@illumina.com <mailto:jb...@illumina.com>




_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to