hi, after digging a little bit the code i came up with some questions due to make the highlighter working with the future release of Lucene (1.3). The questions are:
- why phrase uses a Vector and PhrasePrefix an ArrayList? just curious. - Is it possible add a method "public Term[] getTermsArray()" that will return the "termArrays" from the PhrasePrefixQuery? Is it still populated after we run the search? - Is it possible have a PhrasePrefixQuery of 2+ terms? e.g.: "Microsoft Soft* Windo*" ? why are there 2 methods, one to add a single term another one to add more than one term? is the termsArray an array on term's array ? - Is it correct that PrefixQuery.rewrite(...) is called by the searcher (reader?) at search time to have a BooleanQuery with "OR" condition between each clause? each clause holds a termquery? - PrefixQuery > what do you think of this scenario: user set "populateTermArray()" before run the search, we set a static variable inside the Query class so the setting is reflected to all the XxxQuery classes, in the 'rewrite' method we check this value and if true (default false) we store each term in an array 'termsArray' one for each implementation (wildcard, etc), then when we need to highlight we call getTermsArray() for each query based on the instance type (again: wildcard, etc), then we set the array to null or wait for the garbage collector to release this resource. sounds good?? - PrefixQuery and other query classes that has this method 'rewrite'>> can the method be called more than once at search time? if so we should hold the privious array of terms and add to it the new terms without duplicates. - RangeQuery >> can we apply the same criteria as for the PrefixQuery? - All the classes that extends MultiTermQuery >> can we apply the same criteria as for PrefixQuery? (as above, just add a vector that holds the terms, if the user wants to, and get this array when highlighting, may call a method to release the resource after we are done with the highlight) - how it is possible get the term position of a particular term in a particular document in the index? this will improve a lot the process to get start and end offset of a term in a document. i assume that a text version of the field to highlight is available, e.g.: the content of an html page is a field and is stored in a single text file. Also would make it compatible with the tokenizer as we will use the same we did at indexing time, avoid to write a pattern for each criteria in the RegExp (actually it will not be necessary anymore!) - would all these changes make slower the search process? as a guess, how much? - would the termposition call be slow? Thank you guys. _____________________________________________________________ Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for $19.95/year. http://login.mail.lycos.com/brandPage.shtml?pageId=plus&ref=lmtplus --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
