Hi Chirag, From one of our engineers: "ESTs are categorized as spliced based on having apparent canonical (GT/AG). It looks for gaps in the target (genome) that are at least 32 bases long and don't have associated query (EST) unaligned sequence. Since ESTs are often reverse complemented, it considers both orientations. If there are more apparent introns in one orientation that the other, it considers the EST spliced.
This is implemented by the pslIntronsOnly program." Please let us know if you have any additional questions: [email protected] - Greg Roe UCSC Genome Bioinformatics Group On 7/18/11 4:28 PM, [email protected] wrote: > Dear UCSC, > > I was interested in getting the unspliced EST in any genome of interest. > So, i downloaded the two datasets: > Zebrafish ESTs Including Unspliced > Zebrafish ESTs That Have Been Spliced > > and the data which are only present in "Zebrafish ESTs Including > Unspliced" will give me the unspliced EST. > > I have a question regarding this. > > When i look at this unspliced data, i find many regions are 10-20 KB > long, and apparently have spliced sites as well. I guess, however > those spliced regions do not have canonical introns, GT/AG ends, which > is why it is considered as unspliced EST. > > My concern is: I am not sure, if it is good (atleast in my case) to > consider these tens of KB long EST (with splice junctions), as > unspliced EST. I am basically interested in looking for relatively > short (upto few KB) unspliced EST in introns and 3'UTR. > > Among these unspliced dataset, i could just exclude the long EST and > just focus on small EST, but i am not sure of that is the best way. > Could you please suggest me how to deal with this ?? > > Thank you for your help in advance ! > > regards > Chirag > > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
