Hello, A good way to doube check your data is to use the Table browser. It has some functions to add up key values from tables.
1) Open the Table browser, set the genome information 2) Set group Variation and Repeats, then the track to Repeat Masker 3) Leaving the rmsk table (primary, default), click on "describe table schema" 4) On the next page, it is possible to get summary information for each key in a table by clicking on the links in the schema (ex: value, range) Try and see if this helps, Jennifer ------------------------------------------------ Jennifer Jackson UCSC Genome Bioinformatics Group ----- "Jennifer Jackson" <[email protected]> wrote: > From: "Jennifer Jackson" <[email protected]> > To: "Ping Liang" <[email protected]>, "genome" <[email protected]> > Sent: Tuesday, September 29, 2009 9:09:18 AM GMT -08:00 US/Canada Pacific > Subject: [Genome] A question about the counts of L1 and L2 in rmsk files in > hg18 > > Repost------------------- > > Hello there, > > I tried to obtain the exact counts of the major types of transposable > > elements (TEs) in hg18. I have provided below the scripts I used and > > the obtained result. While the number and percentage for majority of > > the TEs seem to be very close to those previously reported, the > numbers for for L1 and L2 are way off the expected, particular their > > percentage (of the genome) as being 145% and 40.5%, respectively. I > tried the same for the "rmsk" data and got the same. Haven't got a > chance to do that for earlier freezes. Not sure if I did something > wrong or this is due to an error in the rmsk data. Your help in > clarifying this puzzle is greatly appreciated. > > [pli...@genomics rmskRM327]$ awk -F "\t" '($13=="L1"){ln+=1;lL+=$15; > > ll=sprintf("%.1f",lL/31237458.31);print > $11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail > -2 > L1P1 LINE L1 923537 1412 4550400188 145.7 > L1M5 LINE L1 923538 5668 4550405856 145.7 > [pli...@genomics rmskRM327]$ awk -F "\t" '($13=="Alu"){ln+=1;lL+=$15; > > ll=sprintf("%.1f",lL/31237458.31);print > $11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail > -2 > AluSx SINE Alu 1186513 292 329628021 10.6 > AluJb SINE Alu 1186514 303 329628324 10.6 > [pli...@genomics rmskRM327]$ awk -F "\t" '($13=="L2"){ln+=1;lL+=$15; > > ll=sprintf("%.1f",lL/31237458.31);print > $11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail > -2 > L2 LINE L2 408038 2514 1264412355 40.5 > L2 LINE L2 408039 3105 1264415460 40.5 > [pli...@genomics rmskRM327]$ awk -F "\t" '($13=="DNA"){ln+=1;lL+=$15; > > ll=sprintf("%.1f",lL/31237458.31);print > $11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail > -2 > MER53 DNA DNA 13618 193 2145668 0.1 > MER99 DNA DNA 13619 557 2146225 0.1 > [pli...@genomics rmskRM327]$ awk -F "\t" '($12=="DNA"){ln+=1;lL+=$15; > > ll=sprintf("%.1f",lL/31237458.31);print > $11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail > -2 > MER58 DNA MER1_type 391100 92 219940827 7.0 > MER5B DNA MER1_type 391101 104 219940931 7.0 > [pli...@genomics rmskRM327]$ awk -F "\t" '($12=="LTR"){ln+=1;lL+=$15; > > ll=sprintf("%.1f",lL/31237458.31);print > $11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail > -2 > MLT1C LTR MaLR 653850 231 532852066 17.1 > MER66A LTR ERV1 653851 478 532852544 17.1 > [pli...@genomics rmskRM327]$ awk -F "\t" '($13=="MIR"){ln+=1;lL+=$15; > > ll=sprintf("%.1f",lL/31237458.31);print > $11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail > -2 > MIRb SINE MIR 589046 185 121469080 3.9 > MIR SINE MIR 589047 143 121469223 3.9 > > Thanks, > Ping > -- > Ping Liang, PhD > Associate Professor & Canada Research Chair > Department of Biological Sciences > Brock University > St. Catharines, Ontario > Canada L2S 3A1 > > TEL: 905-688-5550 X 5922 > FAX: 905-688-1855 > EMail: [email protected] > > > > ------------------------------------------------ > Jennifer Jackson > UCSC Genome Bioinformatics Group > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
