On Tue, 17 Apr 2001, Stefan Reich wrote:

> From: "toad" <matthew at toseland.f9.co.uk>
> > Randomized exponential insertion? Just going 2,4,8,... suggests easy DoS
> :)
> 
> Yeah, must be randomized.
> 
> Jan - I rediscovered your suggestion in my mailbox (you suggested keyword
> files split by first letter). It would be feasible, but I think the "Lucene
> method" is simpler to implement and scales better. (Searching for parts of
> keywords might be much harder to implement with Lucene though...)
> 

I've seen regexpr things done with search engines by indexing the word
table in character pairs and analysing the regexpr, breaking apart
character pairs and using them as the keys.

For example, if the word "apple" is a new keyword, index "_a", "ap", "pp",
"pl", "le", "e_" so they refer back to the term apple (as well as other
words that contain these char pairs).  When searching for a regexpr like
app.*, break it into keys "_a", "ap", "pp" and apply the regexpr to the
terms that come back from the search over the regexpr index.  The terms
that match the regexpr are then expanded into the original query (i.e.
apple or appliance or application).  This algorithm is a little
complicated and can be more expensive than a simple table scan of all the
unique words in the index.  The character pair indexing (and even triples
and quads) can be useful in OCR / fuzzy recognition and spell check
applications as well.


_______________________________________________
Devl mailing list
Devl at freenetproject.org
http://lists.freenetproject.org/mailman/listinfo/devl
>From - Tue Apr 17 17:36:30 2001
Return-Path: <devl-admin at freenetproject.org>
Received: from hawk.freenetproject.org (postfix@[4.18.42.11])
        by funky.danky.com (8.9.3/8.8.7) with ESMTP id QAA11921
        for <danello at danky.com>; Tue, 17 Apr 2001 16:37:59 -0400
Received: from hawk.freenetproject.org (localhost [127.0.0.1])
        by hawk.freenetproject.org (Postfix) with ESMTP
        id 47F2B581B8; Tue, 17 Apr 2001 13:19:13 -0700 (PDT)
Delivered-To: devl at freenetproject.org
Received: from femail1.rdc1.on.home.com (femail1.rdc1.on.home.com [24.2.9.88])

Reply via email to