Repost-------------------

Hello there,

I tried to obtain the exact counts of the major types of transposable  
elements (TEs) in hg18. I have provided below the scripts I used and  
the obtained result.  While the number and percentage for majority of  
the TEs seem to be very close to those previously reported, the  
numbers for for L1 and L2 are way off the expected, particular their  
percentage (of the genome) as being 145% and 40.5%, respectively. I  
tried the same for the "rmsk" data and got the same. Haven't got a  
chance to do that for earlier freezes.  Not sure if I did something  
wrong or this is due to an error in the rmsk data. Your help in  
clarifying this puzzle is greatly appreciated.

[pli...@genomics rmskRM327]$ awk -F "\t" '($13=="L1"){ln+=1;lL+=$15;  
ll=sprintf("%.1f",lL/31237458.31);print
$11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail -2
L1P1    LINE    L1      923537  1412    4550400188      145.7
L1M5    LINE    L1      923538  5668    4550405856      145.7
[pli...@genomics rmskRM327]$ awk -F "\t" '($13=="Alu"){ln+=1;lL+=$15;  
ll=sprintf("%.1f",lL/31237458.31);print
$11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail -2
AluSx   SINE    Alu     1186513 292     329628021       10.6
AluJb   SINE    Alu     1186514 303     329628324       10.6
[pli...@genomics rmskRM327]$ awk -F "\t" '($13=="L2"){ln+=1;lL+=$15;  
ll=sprintf("%.1f",lL/31237458.31);print
$11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail -2
L2      LINE    L2      408038  2514    1264412355      40.5
L2      LINE    L2      408039  3105    1264415460      40.5
[pli...@genomics rmskRM327]$ awk -F "\t" '($13=="DNA"){ln+=1;lL+=$15;  
ll=sprintf("%.1f",lL/31237458.31);print
$11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail -2
MER53   DNA     DNA     13618   193     2145668 0.1
MER99   DNA     DNA     13619   557     2146225 0.1
[pli...@genomics rmskRM327]$ awk -F "\t" '($12=="DNA"){ln+=1;lL+=$15;  
ll=sprintf("%.1f",lL/31237458.31);print
$11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail -2
MER58   DNA     MER1_type       391100  92      219940827       7.0
MER5B   DNA     MER1_type       391101  104     219940931       7.0
[pli...@genomics rmskRM327]$ awk -F "\t" '($12=="LTR"){ln+=1;lL+=$15;  
ll=sprintf("%.1f",lL/31237458.31);print
$11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail -2
MLT1C   LTR     MaLR    653850  231     532852066       17.1
MER66A  LTR     ERV1    653851  478     532852544       17.1
[pli...@genomics rmskRM327]$ awk -F "\t" '($13=="MIR"){ln+=1;lL+=$15;  
ll=sprintf("%.1f",lL/31237458.31);print
$11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail -2
MIRb    SINE    MIR     589046  185     121469080       3.9
MIR     SINE    MIR     589047  143     121469223       3.9

Thanks,
Ping
--
Ping Liang, PhD
Associate Professor & Canada Research Chair
Department of Biological Sciences
Brock University
St. Catharines, Ontario
Canada L2S 3A1

TEL: 905-688-5550 X 5922
FAX: 905-688-1855
EMail: [email protected]



------------------------------------------------ 
Jennifer Jackson 
UCSC Genome Bioinformatics Group 
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to