Re: [GENERAL] Full text: Ispell dictionary

Oleg Bartunov Fri, 02 May 2014 15:25:28 -0700

Tim,

you did answer yourself - don't use ispell :)


On Sat, May 3, 2014 at 1:45 AM, Tim van der Linden <t...@shisaa.jp> wrote:
> On Fri, 2 May 2014 21:12:56 +0400
> Oleg Bartunov <obartu...@gmail.com> wrote:
>
> Hi Oleg
>
> Thanks for the response!
>
>> Yes, it's normal for ispell dictionary, think about morphological dictionary.
>
> Hmm, I see, that makes sense. I thought the morphological aspect of the 
> Ispell only dealt with splitting up compound words, but it also deals with 
> deriving the word to a more "stem" like form, correct?
>
> As a last question on this, is there a way to disable this dictionary to emit 
> multiple lexemes?
>
> The reason I am asking is because in my (fairly new) understanding of 
> PostgreSQL's full text it is always best to have as few lexemes as possible 
> saved in the vector. This to get smaller indexes and faster matching 
> afterwards. Also, if you run a tsquery afterwards to, you can still employ 
> the power of these multiple lexemes to find a match.
>
> Or...probably answering my own question...if I do not desire this behavior I 
> should maybe not use Ispell and simply use another dictionary :)
>
> Thanks again.
>
> Cheers,
> Tim
>
>> On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden <t...@shisaa.jp> wrote:
>> > Good morning/afternoon all
>> >
>> > I am currently writing a few articles about PostgreSQL's full text 
>> > capabilities and have a question about the Ispell dictionary which I 
>> > cannot seem to find an answer to. It is probably a very simple issue, so 
>> > forgive my ignorance.
>> >
>> > In one article I am explaining about dictionaries and I have setup a 
>> > sample configuration which maps most token categories to only use a Ispell 
>> > dictionary (timusan_ispell) which has a default configuration:
>> >
>> > CREATE TEXT SEARCH DICTIONARY timusan_ispell (
>> >         TEMPLATE = ispell,
>> >         DictFile = en_us,
>> >         AffFile = en_us,
>> >         StopWords = english
>> > );
>> >
>> > When I run a simple query like "SELECT 
>> > to_tsvector('timusan-ispell','smiling')" I get back the following tsvector:
>> >
>> > 'smile':1 'smiling':1
>> >
>> > As you can see I get two lexemes with the same pointer.
>> > The question here is: why does this happen?
>> >
>> > Is it normal behavior for the Ispell dictionary to emit multiple lexemes 
>> > for a single token? And if so, is this efficient? I mean, why could it not 
>> > simply save one lexeme 'smile' which (same as the snowball dictionary) 
>> > would match 'smiling' as well if later matched with the accompanying 
>> > tsquery?
>> >
>> > Thanks!
>> >
>> > Cheers,
>> > Tim
>> >
>> >
>> > --
>> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-general
>
>
> --
> Tim van der Linden <t...@shisaa.jp>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text: Ispell dictionary

Reply via email to