I thought about the ParseTree class a bit today and how it should
work. OK, it was a beautiful day and I was outside too. ;-)
I'm not too tied to any of the details (especially names), but I
figured that it might help to voice some of my thoughts to the list
before hammering away at anything.
For each operator, there should be a derived class. These will each
have their own way of parsing a query and combining subtrees (i.e.
results). So there will be a minimum of an AndParseTree, OrParseTree,
NotParseTree, and ExactParseTree. Others could be added pretty easily
from there (e.g. NearParseTree).
The "boolean" method will really correspond to the base-class since a
boolean query is composed of operators. This Parse method would pick
subclasses as appropriate. Otherwise, for a given method, htsearch
will create the appropriate ParseTree and hand off the user query.
The class would be responsible for assembling itself to fit the query
as well as merging results after searching. (I'm not sure whether it
should do the searching itself or if it should be more of a
container.) Of course before the searching is performed, the Fuzzy
method should be called, passing through the list of fuzzy objects.
This way allows some algorithms to be performed selectively if
necessary. The Fuzzy object essentially already returns a StringList,
so the Parse(StringList *) method can be used.
// Like the List and Stack and other container classes
// Release disconnects the branches
void Release();
// Destroy disconnects branches AND frees them
void Destroy();
// Parse either a base string itself or a list of strings
// Returns either OK or NOTOK as to the correctness of the query
virtual int Parse(String);
virtual int Parse(StringList *);
// Combine right and left lists (if present) according to our specific
// operator type (e.g. AND, OR, NOT, NEAR, etc.)
virtual void Combine();
// If passed a list of Fuzzy methods, use them to fill out the tree
// (note that some subclasses may choose to ignore this if desired)
virtual void Fuzzy(List);
private:
// One or the other of these could be empty
ParseTree *right;
ParseTree *left;
WeightWord data;
ResultList *results;
[Various helper routines for cleaning up user input]
e.g. trimming punctuation, etc.
The WeightWord and ResultList classes will need some modifications as
well, not the least of which is adding a mask to WeightWord to allow
searches to be restricted based on field.
One final concern: this model makes it very difficult to add the
AltaVista url: syntax for adding in URL restricts. I'm not sure how a
ParseTree should pass up information like this about how the search
should be performed. For now I'm content to shelve the problem until
we can get a clean, working replacement for the current code. ;-)
What do people think?
-Geoff
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.