On Fri, 26 Sep 2003 16:11:04 +0200 (Romance Daylight Time) Vadim Zeitlin <[EMAIL PROTECTED]> wrote:
> On Fri, 26 Sep 2003 15:37:50 +0200 Xavier Nodet <[EMAIL PROTECTED]> wrote: > XN> All of those case share the same properties: we repeatedly look for the > XN> same fixed pattern in a huge set of strings. The complications come from > XN> the fact that we (sometimes) allow wildcards in the pattern, and we want > XN> to be able to do case-insensitive searches. > > Actually by far the most often use of wildcards is to allow for "foo*" > (90%) and "*foo*" (5%), so I agree that we could drop them [in other cases]. Ok. > XN> But the most time-consuming task seems to be to convert *all* the > XN> searched strings to lower case so that the search is actually > XN> case-insensitive. Would it be possible to cache the lower-case version > XN> of the strings used in the address-book? > Yes, AdbEntry could do this. Correct. An AdbEntry would have additional members for the lower-case versions of all the searchable strings. We just have to reset them when the original string changes. > Or, rather, Matches() could do it (it would > just cache the values on the first call). Yes, but we still need to reset them from time to time, and store them as additional members. Those changes would be global to the AdbEntryStoredInMemory class anyway, no? > This seems like a pretty simple thing to do, so if this is really the > bottleneck, we're saved! The number in my first message show that lowering a string can be pretty time-consuming. I must had that in this example, the whole string was in upper-case at the beginning, so the speed-up will not necessarily be so huge in M as in the example. But still... > XN> With respect to the search algorithm itself, I implemented a very simple > XN> one given (among tons of others) in > XN> > XN> http://www-igm.univ-mlv.fr/~lecroq/string/node19.html#SECTION00190 > XN> > XN> The code pretty simple and quite effective. As for regexps, it requires > XN> that the pattern is 'compiled' before starting any search. > > This shouldn't be a problem... as search takes literally seconds the > compilation time must be trivial. But this must not be done for each search (I mean, we must not compile the pattern each time we want to compare it to *one* string): it is then longer than the current code. We need to give the wxQSPattern instance to the Matches method, so we have to change this code a little bit. > I do wonder about something else: surely there are some algorithms which > could benefit from compiling the search space. This is how the URL detector works. > For ADB it would make sense to compile it on (or soon after) startup > if it were not *too* time consuming (although even then we could do it > in a separate thread...) and if it is possibly to efficiently update > the compiled object when just one string is changed/added/deleted. I'm not so sure this can be done efficiently. And the point of compiling the search space is actually that it does not change from search to search, after all... > Do you know about anything like this? Actually, I can't think of any feature that would benefit from compiling the set of strings in the ADB. Do we have any feature for which we need to answer the following question: Which, if any, of the ADB entries is included in this (rather long) text? -- Xavier Nodet "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." - Benjamin Franklin, 1759.
pgp00000.pgp
Description: PGP signature