On Mon, Jul 21, 2008 at 11:45 AM, Dean Landolt <[EMAIL PROTECTED]> wrote:
> On Mon, Jul 21, 2008 at 1:08 AM, Dan Reverri <[EMAIL PROTECTED]> wrote: > >> Is it worthwhile to implement a full text indexer on top of couchdbs >> map/reduce functionality? >> >> http://wiki.apache.org/couchdb/FullTextIndexWithView >> > > > Interesting idea. There's definitely more to FTI than tokenization alone, > but then again there's an awful lot of power in m/r and javascript -- it > didn't take me a second to find a porter stemming algorithm in js: > http://tartarus.org/~martin/PorterStemmer/js.txt<http://tartarus.org/%7Emartin/PorterStemmer/js.txt> > > I bet variable weighting would be pretty close to impossible in the m/r > paradigm though, and probably some other features (of course, I could be > wrong, and when it comes to couchdb, thus far I usually am). For a strait-up > word search, this is servicible as is. I'm going to see if I can't figure > out how to shoehorn in some boolean features. > I gave this approach another look and I was able to get a view together that did a little more (stemming, optional case-insensitivity, min length for tokens, better whitespace handling). I'm working on an ngram view too and so far it's promising. But there's still one huge problem -- for the life of me I can't figure out a workable strategy for boolean operations that doesn't involve fully loading each piece of the query. Am I missing something? Is something like this even possible? I know there's no way to load a piece of a view from another view -- but I just can't help but really wish there were.
