Re: splitting a search string into tokens

Daniel F. Savarese Sun, 04 Apr 2004 13:04:46 -0700

In message <[EMAIL PROTECTED]>, "Robert Taylor"
 writes:
>I need to parse the search string into tokens in the manner that search engine
>s would.


Lexical analysis (i.e., tokenization) and parsing are two separate activities.
Sometimes you can get away with combining the two, but you'll find you can
only do so much with split.  Define a regular expression for each of
your tokens and consume the input matching against each in a specified
order.  In your case, tokens appear to be either \s+ (i.e., the separator
which would be discarded), \S+, and "[^"]+"?.  You have to test for the last
token first to avoid misidentifying whitespace.  It so happens that you can
manage this with split.  You appear to have almost gotten there already with:

>Here is what I've tried (but it doesn't cover escaping metacharacters
>which might be in the search string):
>
>    /"(.*?)"|(\w+)/

I don't understand where your search string and escaped metacharacters
enter the picture.  If you need to escape metacharacters in a string,
use Perl5Compiler.quotemeta.  I hope that helps.

daniel



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: splitting a search string into tokens

Reply via email to