Aziz,

I guess I hadn't thought about it that way, so here is more info.

What I'm basically doing is randomly pulling a string of 500 from one string
and looking for it in another string. So I'm looking for a substring of the
larger string that matches my query string. In terms of how it matches the
answer and to your questions, all of the above. I don't care if there are
insertions, deletions of just character changes, as long as the query sting is
80% similar to the subject string.

Like I said I know I can use the module Similarity. But in order to do this I
would need bot the query and the subject string. And to get the subject string
I would need to 'slide' down the larger string and pull out all combinations 1
by 1. This is very slow with a 4.5 million character string. I'm just looking
for a way to speed things up.

BTW, if it helps at all, I'm doing genetic analysis of whole genomes, hence the
4.5 million long string.

-Bob

--- Abdulaziz Ghuloum <[EMAIL PROTECTED]> wrote:
> Hello,
> 
> I don't have a direct answer for your question since your question is a
> little bit ambigious; let me explain:
> 
> Do you want to search for a substring in a long string, or you want true
> regexp match?
> 
> If you want a true regexp match, then the question is even more
> ambigious.  For example, the regexp /(ab){20}a{10}/ does not match the
> string "ababababababaaaaaaaaa", but what percentage does it match?  How
> do you determine the percentage let alone matching for a given error
> percentage.
> 
> If you just want a substring match (not a regexp), then your problem is
> simpler, but you need to define your criteria more clearly.
> 
> Do you want to allow up to 20% more characters to be inserted to the
> substring and still consider it a match?
> 
> Do you want to match down to 80% of the substring and still consider it a
> match?
> 
> Do you want to allow up to 20% of the characters to be altered (not
> removed or inserted) and still consider it a match?
> 
> Or a combination of the above?
> 
> Without this information, no reply is even remotely correct.  Please
> provide more information about the problem to get closer to the solution
> to your problem.
> 
> Hope this helps,,,
> 
> Aziz,,, 
> 
> In article <[EMAIL PROTECTED]>, "Bob
> Mangold" <[EMAIL PROTECTED]> wrote:
> 
> > Hello,
> > 
> > I'm working on a program where I am searching for a short string within
> > a longer string. The catch is that the long string is about 4.5 million
> > chars long and the short string is about 500. Using a regex to do an
> > exact match is simple, but what if I want just a close match, like 80%
> > or whatever. I've used the module Similarity in the past, but in order
> > to use that I have to send it two string to compare, which means I'd
> > have to slide down the longer string character by character. Is there a
> > faster way. Is there a module that returns true for a regex match within
> > a certain percentage?
> > 
> > -Bob
> > 
> > __________________________________________________ Do You Yahoo!?
> > Make international calls for as low as $.04/minute with Yahoo! Messenger
> > http://phonecard.yahoo.com/
> 
> -- 
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


__________________________________________________
Do You Yahoo!?
Make international calls for as low as $.04/minute with Yahoo! Messenger
http://phonecard.yahoo.com/

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to