Repost-------------------
Hello there,
I tried to obtain the exact counts of the major types of transposable
elements (TEs) in hg18. I have provided below the scripts I used and
the obtained result. While the number and percentage for majority of
the TEs seem to be very close to those previously reported, the
numbers for for L1 and L2 are way off the expected, particular their
percentage (of the genome) as being 145% and 40.5%, respectively. I
tried the same for the "rmsk" data and got the same. Haven't got a
chance to do that for earlier freezes. Not sure if I did something
wrong or this is due to an error in the rmsk data. Your help in
clarifying this puzzle is greatly appreciated.
[pli...@genomics rmskRM327]$ awk -F "\t" '($13=="L1"){ln+=1;lL+=$15;
ll=sprintf("%.1f",lL/31237458.31);print
$11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail -2
L1P1 LINE L1 923537 1412 4550400188 145.7
L1M5 LINE L1 923538 5668 4550405856 145.7
[pli...@genomics rmskRM327]$ awk -F "\t" '($13=="Alu"){ln+=1;lL+=$15;
ll=sprintf("%.1f",lL/31237458.31);print
$11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail -2
AluSx SINE Alu 1186513 292 329628021 10.6
AluJb SINE Alu 1186514 303 329628324 10.6
[pli...@genomics rmskRM327]$ awk -F "\t" '($13=="L2"){ln+=1;lL+=$15;
ll=sprintf("%.1f",lL/31237458.31);print
$11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail -2
L2 LINE L2 408038 2514 1264412355 40.5
L2 LINE L2 408039 3105 1264415460 40.5
[pli...@genomics rmskRM327]$ awk -F "\t" '($13=="DNA"){ln+=1;lL+=$15;
ll=sprintf("%.1f",lL/31237458.31);print
$11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail -2
MER53 DNA DNA 13618 193 2145668 0.1
MER99 DNA DNA 13619 557 2146225 0.1
[pli...@genomics rmskRM327]$ awk -F "\t" '($12=="DNA"){ln+=1;lL+=$15;
ll=sprintf("%.1f",lL/31237458.31);print
$11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail -2
MER58 DNA MER1_type 391100 92 219940827 7.0
MER5B DNA MER1_type 391101 104 219940931 7.0
[pli...@genomics rmskRM327]$ awk -F "\t" '($12=="LTR"){ln+=1;lL+=$15;
ll=sprintf("%.1f",lL/31237458.31);print
$11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail -2
MLT1C LTR MaLR 653850 231 532852066 17.1
MER66A LTR ERV1 653851 478 532852544 17.1
[pli...@genomics rmskRM327]$ awk -F "\t" '($13=="MIR"){ln+=1;lL+=$15;
ll=sprintf("%.1f",lL/31237458.31);print
$11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail -2
MIRb SINE MIR 589046 185 121469080 3.9
MIR SINE MIR 589047 143 121469223 3.9
Thanks,
Ping
--
Ping Liang, PhD
Associate Professor & Canada Research Chair
Department of Biological Sciences
Brock University
St. Catharines, Ontario
Canada L2S 3A1
TEL: 905-688-5550 X 5922
FAX: 905-688-1855
EMail: [email protected]
------------------------------------------------
Jennifer Jackson
UCSC Genome Bioinformatics Group
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome