Hi Heikki,
On 06/22/11 13:53, Heikki Salavirta wrote:
Hello,
I've been running einverted queries with the same sequence, with the
only difference in parameters being max repeat (e.g. 1000, 2000, 4000,
6000, 8000, 10000 & 20000). Other parameters are gap: 12, threshold: 50,
match: 3 & mismatch: -4.
I'd expect that this would result in an ever increasing number of
results, as the max repeat parameter increases. However, this is not
what I'm seeing.
E.g. when max repeat is 2000, there are 3 results from the sequence
prior to >10kb loci.
It's the same result when max repeat is 4000.
However, when max repeat is 6000, there are 2 results prior to >10kb
loci, and 1 of them is not reported by 2000 & 4000 queries. In this
particular result the gap between the inverted repeats is only 2
nucleotides!
When max repeat is 8000, there's only 1 result prior to >10kb loci,
which is also reported by 2000, 4000 & 6000 queries.
When max repeat is 1000, there are 4 results prior to >10kb loci.
Could somebody perhaps explain these unexpected result, and perhaps
suggest proper parameters for finding all inverted repeats from a >150kb
sequence.
einverted was designed for the annotation of the Caenorhabditis elegans
genome. It deliberately does not find all inverted repeats.
The algorithm searches the max repeat region and reports the highest
scoring repeat.
It can also report other high scoring repeats in the region, but only if
they are not already covered by a result.
We have changed the way the overlapping scores in the max repeats region
are detected (there was a problem in the original program when repeat
traceback went over the end of the region) but it should not
significantly change the number of reported hits.
As to finding all inverted repeats ... I think that is not possible with
einverted because it uses a dynamic programming approach and will fail
to find overlapping repeats.
We would be interested in any other algorithms we could implement if
there are no open source applications available for them.
Hope this helps
Peter Rice
EMBOSS Team
_______________________________________________
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss