Hi Heikki,

On 06/22/11 13:53, Heikki Salavirta wrote:
Hello,

I've been running einverted queries with the same sequence, with the
only difference in parameters being max repeat (e.g. 1000, 2000, 4000,
6000, 8000, 10000 & 20000). Other parameters are gap: 12, threshold: 50,
match: 3 & mismatch: -4.

I'd expect that this would result in an ever increasing number of
results, as the max repeat parameter increases. However, this is not
what I'm seeing.

E.g. when max repeat is 2000, there are 3 results from the sequence
prior to >10kb loci.

It's the same result when max repeat is 4000.

However, when max repeat is 6000, there are 2 results prior to >10kb
loci, and 1 of them is not reported by 2000 & 4000 queries. In this
particular result the gap between the inverted repeats is only 2
nucleotides!

When max repeat is 8000, there's only 1 result prior to >10kb loci,
which is also reported by 2000, 4000 & 6000 queries.

When max repeat is 1000, there are 4 results prior to >10kb loci.

Could somebody perhaps explain these unexpected result, and perhaps
suggest proper parameters for finding all inverted repeats from a >150kb
sequence.

einverted was designed for the annotation of the Caenorhabditis elegans genome. It deliberately does not find all inverted repeats.

The algorithm searches the max repeat region and reports the highest scoring repeat.

It can also report other high scoring repeats in the region, but only if they are not already covered by a result.

We have changed the way the overlapping scores in the max repeats region are detected (there was a problem in the original program when repeat traceback went over the end of the region) but it should not significantly change the number of reported hits.

As to finding all inverted repeats ... I think that is not possible with einverted because it uses a dynamic programming approach and will fail to find overlapping repeats.

We would be interested in any other algorithms we could implement if there are no open source applications available for them.

Hope this helps

Peter Rice
EMBOSS Team
_______________________________________________
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss

Reply via email to