Tim, you did answer yourself - don't use ispell :)
On Sat, May 3, 2014 at 1:45 AM, Tim van der Linden <t...@shisaa.jp> wrote: > On Fri, 2 May 2014 21:12:56 +0400 > Oleg Bartunov <obartu...@gmail.com> wrote: > > Hi Oleg > > Thanks for the response! > >> Yes, it's normal for ispell dictionary, think about morphological dictionary. > > Hmm, I see, that makes sense. I thought the morphological aspect of the > Ispell only dealt with splitting up compound words, but it also deals with > deriving the word to a more "stem" like form, correct? > > As a last question on this, is there a way to disable this dictionary to emit > multiple lexemes? > > The reason I am asking is because in my (fairly new) understanding of > PostgreSQL's full text it is always best to have as few lexemes as possible > saved in the vector. This to get smaller indexes and faster matching > afterwards. Also, if you run a tsquery afterwards to, you can still employ > the power of these multiple lexemes to find a match. > > Or...probably answering my own question...if I do not desire this behavior I > should maybe not use Ispell and simply use another dictionary :) > > Thanks again. > > Cheers, > Tim > >> On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden <t...@shisaa.jp> wrote: >> > Good morning/afternoon all >> > >> > I am currently writing a few articles about PostgreSQL's full text >> > capabilities and have a question about the Ispell dictionary which I >> > cannot seem to find an answer to. It is probably a very simple issue, so >> > forgive my ignorance. >> > >> > In one article I am explaining about dictionaries and I have setup a >> > sample configuration which maps most token categories to only use a Ispell >> > dictionary (timusan_ispell) which has a default configuration: >> > >> > CREATE TEXT SEARCH DICTIONARY timusan_ispell ( >> > TEMPLATE = ispell, >> > DictFile = en_us, >> > AffFile = en_us, >> > StopWords = english >> > ); >> > >> > When I run a simple query like "SELECT >> > to_tsvector('timusan-ispell','smiling')" I get back the following tsvector: >> > >> > 'smile':1 'smiling':1 >> > >> > As you can see I get two lexemes with the same pointer. >> > The question here is: why does this happen? >> > >> > Is it normal behavior for the Ispell dictionary to emit multiple lexemes >> > for a single token? And if so, is this efficient? I mean, why could it not >> > simply save one lexeme 'smile' which (same as the snowball dictionary) >> > would match 'smiling' as well if later matched with the accompanying >> > tsquery? >> > >> > Thanks! >> > >> > Cheers, >> > Tim >> > >> > >> > -- >> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> > To make changes to your subscription: >> > http://www.postgresql.org/mailpref/pgsql-general > > > -- > Tim van der Linden <t...@shisaa.jp> -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general