Hello,

A good way to doube check your data is to use the Table browser. It has some 
functions to add up key values from tables.

1) Open the Table browser, set the genome information
2) Set group Variation and Repeats, then the track to Repeat Masker
3) Leaving the rmsk table (primary, default), click on "describe table schema"
4) On the next page, it is possible to get summary information for each key in 
a table by clicking on the links in the schema (ex: value, range)

Try and see if this helps,
Jennifer


------------------------------------------------ 
Jennifer Jackson 
UCSC Genome Bioinformatics Group 

----- "Jennifer Jackson" <[email protected]> wrote:

> From: "Jennifer Jackson" <[email protected]>
> To: "Ping Liang" <[email protected]>, "genome" <[email protected]>
> Sent: Tuesday, September 29, 2009 9:09:18 AM GMT -08:00 US/Canada Pacific
> Subject: [Genome] A question about the counts of L1 and L2 in rmsk files in 
> hg18
>
> Repost-------------------
> 
> Hello there,
> 
> I tried to obtain the exact counts of the major types of transposable 
> 
> elements (TEs) in hg18. I have provided below the scripts I used and 
> 
> the obtained result.  While the number and percentage for majority of 
> 
> the TEs seem to be very close to those previously reported, the  
> numbers for for L1 and L2 are way off the expected, particular their 
> 
> percentage (of the genome) as being 145% and 40.5%, respectively. I  
> tried the same for the "rmsk" data and got the same. Haven't got a  
> chance to do that for earlier freezes.  Not sure if I did something  
> wrong or this is due to an error in the rmsk data. Your help in  
> clarifying this puzzle is greatly appreciated.
> 
> [pli...@genomics rmskRM327]$ awk -F "\t" '($13=="L1"){ln+=1;lL+=$15; 
> 
> ll=sprintf("%.1f",lL/31237458.31);print
> $11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail
> -2
> L1P1    LINE    L1      923537  1412    4550400188      145.7
> L1M5    LINE    L1      923538  5668    4550405856      145.7
> [pli...@genomics rmskRM327]$ awk -F "\t" '($13=="Alu"){ln+=1;lL+=$15; 
> 
> ll=sprintf("%.1f",lL/31237458.31);print
> $11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail
> -2
> AluSx   SINE    Alu     1186513 292     329628021       10.6
> AluJb   SINE    Alu     1186514 303     329628324       10.6
> [pli...@genomics rmskRM327]$ awk -F "\t" '($13=="L2"){ln+=1;lL+=$15; 
> 
> ll=sprintf("%.1f",lL/31237458.31);print
> $11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail
> -2
> L2      LINE    L2      408038  2514    1264412355      40.5
> L2      LINE    L2      408039  3105    1264415460      40.5
> [pli...@genomics rmskRM327]$ awk -F "\t" '($13=="DNA"){ln+=1;lL+=$15; 
> 
> ll=sprintf("%.1f",lL/31237458.31);print
> $11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail
> -2
> MER53   DNA     DNA     13618   193     2145668 0.1
> MER99   DNA     DNA     13619   557     2146225 0.1
> [pli...@genomics rmskRM327]$ awk -F "\t" '($12=="DNA"){ln+=1;lL+=$15; 
> 
> ll=sprintf("%.1f",lL/31237458.31);print
> $11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail
> -2
> MER58   DNA     MER1_type       391100  92      219940827       7.0
> MER5B   DNA     MER1_type       391101  104     219940931       7.0
> [pli...@genomics rmskRM327]$ awk -F "\t" '($12=="LTR"){ln+=1;lL+=$15; 
> 
> ll=sprintf("%.1f",lL/31237458.31);print
> $11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail
> -2
> MLT1C   LTR     MaLR    653850  231     532852066       17.1
> MER66A  LTR     ERV1    653851  478     532852544       17.1
> [pli...@genomics rmskRM327]$ awk -F "\t" '($13=="MIR"){ln+=1;lL+=$15; 
> 
> ll=sprintf("%.1f",lL/31237458.31);print
> $11"\t"$12"\t"$13"\t"ln"\t"$15"\t"lL"\t"ll}' ../rmsk/*_rmsk.txt |tail
> -2
> MIRb    SINE    MIR     589046  185     121469080       3.9
> MIR     SINE    MIR     589047  143     121469223       3.9
> 
> Thanks,
> Ping
> --
> Ping Liang, PhD
> Associate Professor & Canada Research Chair
> Department of Biological Sciences
> Brock University
> St. Catharines, Ontario
> Canada L2S 3A1
> 
> TEL: 905-688-5550 X 5922
> FAX: 905-688-1855
> EMail: [email protected]
> 
> 
> 
> ------------------------------------------------ 
> Jennifer Jackson 
> UCSC Genome Bioinformatics Group 
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to