On Tue, 17 Apr 2001, Stefan Reich wrote: > From: "toad" <matthew at toseland.f9.co.uk> > > Randomized exponential insertion? Just going 2,4,8,... suggests easy DoS > :) > > Yeah, must be randomized. > > Jan - I rediscovered your suggestion in my mailbox (you suggested keyword > files split by first letter). It would be feasible, but I think the "Lucene > method" is simpler to implement and scales better. (Searching for parts of > keywords might be much harder to implement with Lucene though...) >
I've seen regexpr things done with search engines by indexing the word table in character pairs and analysing the regexpr, breaking apart character pairs and using them as the keys. For example, if the word "apple" is a new keyword, index "_a", "ap", "pp", "pl", "le", "e_" so they refer back to the term apple (as well as other words that contain these char pairs). When searching for a regexpr like app.*, break it into keys "_a", "ap", "pp" and apply the regexpr to the terms that come back from the search over the regexpr index. The terms that match the regexpr are then expanded into the original query (i.e. apple or appliance or application). This algorithm is a little complicated and can be more expensive than a simple table scan of all the unique words in the index. The character pair indexing (and even triples and quads) can be useful in OCR / fuzzy recognition and spell check applications as well. _______________________________________________ Devl mailing list Devl at freenetproject.org http://lists.freenetproject.org/mailman/listinfo/devl >From - Tue Apr 17 17:36:30 2001 Return-Path: <devl-admin at freenetproject.org> Received: from hawk.freenetproject.org (postfix@[4.18.42.11]) by funky.danky.com (8.9.3/8.8.7) with ESMTP id QAA11921 for <danello at danky.com>; Tue, 17 Apr 2001 16:37:59 -0400 Received: from hawk.freenetproject.org (localhost [127.0.0.1]) by hawk.freenetproject.org (Postfix) with ESMTP id 47F2B581B8; Tue, 17 Apr 2001 13:19:13 -0700 (PDT) Delivered-To: devl at freenetproject.org Received: from femail1.rdc1.on.home.com (femail1.rdc1.on.home.com [24.2.9.88])
