Hello Nandini,

One of our developers suggests using a combination of liftOver with 
-bedPlus option and awk:


# Discard lines beginning with # and add a 0-based start column to get 
BED 3+:
awk '$0 !~ /^#/ {printf "%s\t%d", $1, $2-1; for (n=2; n <= NF; n++) 
{printf "\t%s", $n;} printf "\n"};' input.vcf > output.bed

# Transform the first 3 columns, leaving subsequent columns untouched:
liftOver -bedPlus=3 output.bed hg19ToHg18.over.chain.gz output_hg18.bed 
unmapped.bed

# Drop the second (0-based start) column to get VCF:
cut -f1,3- output_hg18.bed > output_hg18.vcf

Note these VCF-specific caveats:

1. The header lines must be added back at the beginning of the final 
output, in order to make valid VCF, and if the header includes a 
description of the reference, that should be edited to NCBI36/hg18

2. If any INFO column attributes depend on assembly coordinates (e.g. 
END), they will be incorrect since liftOver doesn't know about them. 
Also, assembly changes may affect indel properties etc.


Best regards,

Pauline Fujita,
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu




On 09/21/11 06:20, Nandini B wrote:
> Hello,
> 
> Sometime back, I had asked for help regarding the conversion for the vcf 
> format to bed format for liftOver. But unfortunately, that did not work for 
> me.
> My vcf format goes like this
> 
> ##fileformat=VCFv4.1
> ##samtoolsVersion=0.1.16 (r963:234)
> .......
> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT      
> sample1      sample2      sample3   sample 4
> chr1           11457    .          C       G       49.7    .       
> DP=3;AF1=0.9999;CI95=0.375,1;DP4=0,0,0,3;MQ=49;FQ=-29.3 GT:PL:GQ        
> 1/1:4,3,0:39  1/1:40,3,0:39   ...
> 
> I used the awk command
> 
> grep -v "^#" input.vcf | awk '{printf 
> "%s\t%d\t%d\t%s|%s|%s|%s|%s|%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n", 
> $1,$2-1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15}' > output.bed
> 
> Now when I tried to use it for the liftOver , I get an error saying
> 
>> ./liftOver output.bed hg19ToHg18.over.chain.gz new.bed unMapped
> Reading liftover chains
> Mapping coordinates
> Expecting integer field 7 line 1 of output.bed, got 1/1:40,3,0:39
> 
> I need to retain GT:PL:GQ values as I would need it for my further analysis.
> 
> Is there any way out of this ? Any suggestions would be appreciated..
> 
> 
> Thank you,
> Warm Regards,
> 
> Nandini 
> 
> Thank you,
> Warm Regards,
> 
> Nandini Badarinarayan
> 
> 
>> Date: Tue, 2 Aug 2011 10:06:23 -0700
>> From: [email protected]
>> To: [email protected]
>> CC: [email protected]
>> Subject: Re: [Genome] liftover hg19 to hg18
>>
>> Good Morning Nandini:
>>
>> You can convert your VCF files to bed format.
>>
>> If your VCF files follow the format:
>> http://www.1000genomes.org/node/101
>>
>> This awk command will convert them to bed format (assuming | is not a 
>> character found in the vcf fields):
>>
>> grep -v "^#" yourFile.vcf | awk '{printf "%s\t%d\t%d\t%s|%s|%s|%s|%s|%s\n", 
>> $1,$2-1,$2,$3,$4,$5,$6,$7,$8}' > yourFile.bed
>> liftOver yourFile.bed hg19ToHg18.over.chain newFile.bed unMapped
>>
>> Then, from that lifted bed file back to vcf:
>>
>> awk '{printf "%s\t%d\t%s\n", $1,$2+1,$4}' newFile.bed | tr '[|]' '[\t]' > 
>> newFile.vcf
>>
>> You may need to adjust these awk statements depending upon the exact format 
>> of your vcf file.
>>
>> --Hiram
>>
>> Nandini B wrote:
>>> Hello,
>>> I am trying to use liftover for my SNPs from hg19 to hg18. But I have my 
>>> SNP files in either gff3 format or vcf (v4.1). Is it possible to use these 
>>> files to execute the command 
>>> liftOver -gff oldFile.gff3 hg19ToHg18.over.chain newFile.gff3 unMapped  or
>>>
>>> liftOver -vcf oldFile.vcf hg19ToHg18.over.chain newFile.vcf unMapped
>>>
>>> Or do I have to convert them into BED format, if so, is there any tool that 
>>> could do this ?
>>>
>>>
>>> Thank you,
>>> Warm Regards,
>>>
>>> Nandini Badarinarayan
>                                         
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to