Re: [GENERAL] Full text: Ispell dictionary

2014-05-09 Thread Tim van der Linden
Hi Oleg

 btw, take a look on contrib/dict_xsyn, it's  more powerful than
 synonym dictionary.

Sorry for the late reply...and thank you for the tip.

I will check out xsyn soon. I am about to finish the third and final chapter of 
my full text series, but I could maybe write an appendix chapter which 
mentions xsyn...or just update my posts.

Cheers,
Tim

 On Sat, May 3, 2014 at 2:26 AM, Tim van der Linden t...@shisaa.jp wrote:
  Hi Oleg
 
  Haha, understood!
 
  Thanks for helping me on this one.
 
  Cheers
  Tim
 
 
  On May 3, 2014 7:24:08 AM GMT+09:00, Oleg Bartunov obartu...@gmail.com
  wrote:
 
  Tim,
 
  you did answer yourself - don't use ispell :)
 
  On Sat, May 3, 2014 at 1:45 AM, Tim van der Linden t...@shisaa.jp wrote:
 
   On Fri, 2 May 2014 21:12:56 +0400
   Oleg Bartunov obartu...@gmail.com wrote:
 
   Hi Oleg
 
   Thanks for the response!
 
   Yes, it's normal for ispell dictionary, think about morphological
  dictionary.
 
 
   Hmm, I see, that makes sense. I thought the morphological aspect of the
  Ispell only dealt with splitting up compound words, but it also deals with
  deriving the word to a more stem like form, correct?
 
   As a last question on this, is there a way to disable this dictionary to
  emit multiple lexemes?
 
 
  The reason I am asking is because in my (fairly new) understanding of
  PostgreSQL's full text it is always best to have as few lexemes as 
  possible
  saved in the vector. This to get smaller indexes and faster matching
  afterwards. Also, if you run a tsquery afterwards to, you can still employ
  the power of these multiple lexemes to find a match.
 
   Or...probably answering my own question...if I do not desire this
  behavior I should maybe not use Ispell and simply use another dictionary 
  :)
 
   Thanks again.
 
   Cheers,
   Tim
 
   On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden t...@shisaa.jp
  wrote:
 
   Good morning/afternoon all
 
   I am currently writing a few articles about PostgreSQL's full text
  capabilities and have a question about the Ispell dictionary which I
  cannot seem to find an answer to. It is probably a very simple issue, so
  forgive my ignorance.
 
   In one article I am explaining about dictionaries and I have setup a
  sample configuration which maps most token categories to only use a 
  Ispell
  dictionary (timusan_ispell) which has a default configuration:
 
   CREATE TEXT SEARCH DICTIONARY timusan_ispell (
   TEMPLATE = ispell,
   DictFile = en_us,
   AffFile = en_us,
   StopWords = english
   );
 
   When I run a simple query like SELECT
  to_tsvector('timusan-ispell','smiling') I get back the following 
  tsvector:
 
   'smile':1 'smiling':1
 
   As you can see I get two lexemes with the same pointer.
   The question here is: why does this happen?
 
   Is it normal behavior for the Ispell dictionary to emit multiple
  lexemes for a single token? And if so, is this efficient? I
  mean, why could it not simply save one lexeme 'smile' which (same as
  the snowball dictionary) would match 'smiling' as well if later matched 
  with
  the accompanying tsquery?
 
   Thanks!
 
   Cheers,
   Tim
 
 
   --
   Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
   To make changes to your subscription:
   http://www.postgresql.org/mailpref/pgsql-general
 
 
 
   --
   Tim van der Linden t...@shisaa.jp


-- 
Tim van der Linden t...@shisaa.jp


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Full text: Ispell dictionary

2014-05-07 Thread Oleg Bartunov
btw, take a look on contrib/dict_xsyn, it's  more powerful than
synonym dictionary.

On Sat, May 3, 2014 at 2:26 AM, Tim van der Linden t...@shisaa.jp wrote:
 Hi Oleg

 Haha, understood!

 Thanks for helping me on this one.

 Cheers
 Tim


 On May 3, 2014 7:24:08 AM GMT+09:00, Oleg Bartunov obartu...@gmail.com
 wrote:

 Tim,

 you did answer yourself - don't use ispell :)

 On Sat, May 3, 2014 at 1:45 AM, Tim van der Linden t...@shisaa.jp wrote:

  On Fri, 2 May 2014 21:12:56 +0400
  Oleg Bartunov obartu...@gmail.com wrote:

  Hi Oleg

  Thanks for the response!

  Yes, it's normal for ispell dictionary, think about morphological
 dictionary.


  Hmm, I see, that makes sense. I thought the morphological aspect of the
 Ispell only dealt with splitting up compound words, but it also deals with
 deriving the word to a more stem like form, correct?

  As a last question on this, is there a way to disable this dictionary to
 emit multiple lexemes?


 The reason I am asking is because in my (fairly new) understanding of
 PostgreSQL's full text it is always best to have as few lexemes as possible
 saved in the vector. This to get smaller indexes and faster matching
 afterwards. Also, if you run a tsquery afterwards to, you can still employ
 the power of these multiple lexemes to find a match.

  Or...probably answering my own question...if I do not desire this
 behavior I should maybe not use Ispell and simply use another dictionary :)

  Thanks again.

  Cheers,
  Tim

  On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden t...@shisaa.jp
 wrote:

  Good morning/afternoon all

  I am currently writing a few articles about PostgreSQL's full text
 capabilities and have a question about the Ispell dictionary which I
 cannot seem to find an answer to. It is probably a very simple issue, so
 forgive my ignorance.

  In one article I am explaining about dictionaries and I have setup a
 sample configuration which maps most token categories to only use a Ispell
 dictionary (timusan_ispell) which has a default configuration:

  CREATE TEXT SEARCH DICTIONARY timusan_ispell (
  TEMPLATE = ispell,
  DictFile = en_us,
  AffFile = en_us,
  StopWords = english
  );

  When I run a simple query like SELECT
 to_tsvector('timusan-ispell','smiling') I get back the following 
 tsvector:

  'smile':1 'smiling':1

  As you can see I get two lexemes with the same pointer.
  The question here is: why does this happen?

  Is it normal behavior for the Ispell dictionary to emit multiple
 lexemes for a single token? And if so, is this efficient? I
 mean, why could it not simply save one lexeme 'smile' which (same as
 the snowball dictionary) would match 'smiling' as well if later matched 
 with
 the accompanying tsquery?

  Thanks!

  Cheers,
  Tim


  --
  Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-general



  --
  Tim van der Linden t...@shisaa.jp


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Full text: Ispell dictionary

2014-05-02 Thread Tim van der Linden
Good morning/afternoon all

I am currently writing a few articles about PostgreSQL's full text capabilities 
and have a question about the Ispell dictionary which I cannot seem to find an 
answer to. It is probably a very simple issue, so forgive my ignorance.

In one article I am explaining about dictionaries and I have setup a sample 
configuration which maps most token categories to only use a Ispell dictionary 
(timusan_ispell) which has a default configuration: 

CREATE TEXT SEARCH DICTIONARY timusan_ispell (
TEMPLATE = ispell,
DictFile = en_us,
AffFile = en_us,
StopWords = english
);

When I run a simple query like SELECT to_tsvector('timusan-ispell','smiling') 
I get back the following tsvector:

'smile':1 'smiling':1

As you can see I get two lexemes with the same pointer.
The question here is: why does this happen? 

Is it normal behavior for the Ispell dictionary to emit multiple lexemes for a 
single token? And if so, is this efficient? I mean, why could it not simply 
save one lexeme 'smile' which (same as the snowball dictionary) would match 
'smiling' as well if later matched with the accompanying tsquery?

Thanks!

Cheers,
Tim


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Full text: Ispell dictionary

2014-05-02 Thread Oleg Bartunov
Yes, it's normal for ispell dictionary, think about morphological dictionary.

On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden t...@shisaa.jp wrote:
 Good morning/afternoon all

 I am currently writing a few articles about PostgreSQL's full text 
 capabilities and have a question about the Ispell dictionary which I cannot 
 seem to find an answer to. It is probably a very simple issue, so forgive my 
 ignorance.

 In one article I am explaining about dictionaries and I have setup a sample 
 configuration which maps most token categories to only use a Ispell 
 dictionary (timusan_ispell) which has a default configuration:

 CREATE TEXT SEARCH DICTIONARY timusan_ispell (
 TEMPLATE = ispell,
 DictFile = en_us,
 AffFile = en_us,
 StopWords = english
 );

 When I run a simple query like SELECT 
 to_tsvector('timusan-ispell','smiling') I get back the following tsvector:

 'smile':1 'smiling':1

 As you can see I get two lexemes with the same pointer.
 The question here is: why does this happen?

 Is it normal behavior for the Ispell dictionary to emit multiple lexemes for 
 a single token? And if so, is this efficient? I mean, why could it not simply 
 save one lexeme 'smile' which (same as the snowball dictionary) would match 
 'smiling' as well if later matched with the accompanying tsquery?

 Thanks!

 Cheers,
 Tim


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Full text: Ispell dictionary

2014-05-02 Thread Tim van der Linden
On Fri, 2 May 2014 21:12:56 +0400
Oleg Bartunov obartu...@gmail.com wrote:

Hi Oleg

Thanks for the response!

 Yes, it's normal for ispell dictionary, think about morphological dictionary.

Hmm, I see, that makes sense. I thought the morphological aspect of the Ispell 
only dealt with splitting up compound words, but it also deals with deriving 
the word to a more stem like form, correct?

As a last question on this, is there a way to disable this dictionary to emit 
multiple lexemes? 

The reason I am asking is because in my (fairly new) understanding of 
PostgreSQL's full text it is always best to have as few lexemes as possible 
saved in the vector. This to get smaller indexes and faster matching 
afterwards. Also, if you run a tsquery afterwards to, you can still employ the 
power of these multiple lexemes to find a match.

Or...probably answering my own question...if I do not desire this behavior I 
should maybe not use Ispell and simply use another dictionary :)

Thanks again.

Cheers,
Tim

 On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden t...@shisaa.jp wrote:
  Good morning/afternoon all
 
  I am currently writing a few articles about PostgreSQL's full text 
  capabilities and have a question about the Ispell dictionary which I cannot 
  seem to find an answer to. It is probably a very simple issue, so forgive 
  my ignorance.
 
  In one article I am explaining about dictionaries and I have setup a sample 
  configuration which maps most token categories to only use a Ispell 
  dictionary (timusan_ispell) which has a default configuration:
 
  CREATE TEXT SEARCH DICTIONARY timusan_ispell (
  TEMPLATE = ispell,
  DictFile = en_us,
  AffFile = en_us,
  StopWords = english
  );
 
  When I run a simple query like SELECT 
  to_tsvector('timusan-ispell','smiling') I get back the following tsvector:
 
  'smile':1 'smiling':1
 
  As you can see I get two lexemes with the same pointer.
  The question here is: why does this happen?
 
  Is it normal behavior for the Ispell dictionary to emit multiple lexemes 
  for a single token? And if so, is this efficient? I mean, why could it not 
  simply save one lexeme 'smile' which (same as the snowball dictionary) 
  would match 'smiling' as well if later matched with the accompanying 
  tsquery?
 
  Thanks!
 
  Cheers,
  Tim
 
 
  --
  Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-general


-- 
Tim van der Linden t...@shisaa.jp


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Full text: Ispell dictionary

2014-05-02 Thread Oleg Bartunov
Tim,

you did answer yourself - don't use ispell :)

On Sat, May 3, 2014 at 1:45 AM, Tim van der Linden t...@shisaa.jp wrote:
 On Fri, 2 May 2014 21:12:56 +0400
 Oleg Bartunov obartu...@gmail.com wrote:

 Hi Oleg

 Thanks for the response!

 Yes, it's normal for ispell dictionary, think about morphological dictionary.

 Hmm, I see, that makes sense. I thought the morphological aspect of the 
 Ispell only dealt with splitting up compound words, but it also deals with 
 deriving the word to a more stem like form, correct?

 As a last question on this, is there a way to disable this dictionary to emit 
 multiple lexemes?

 The reason I am asking is because in my (fairly new) understanding of 
 PostgreSQL's full text it is always best to have as few lexemes as possible 
 saved in the vector. This to get smaller indexes and faster matching 
 afterwards. Also, if you run a tsquery afterwards to, you can still employ 
 the power of these multiple lexemes to find a match.

 Or...probably answering my own question...if I do not desire this behavior I 
 should maybe not use Ispell and simply use another dictionary :)

 Thanks again.

 Cheers,
 Tim

 On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden t...@shisaa.jp wrote:
  Good morning/afternoon all
 
  I am currently writing a few articles about PostgreSQL's full text 
  capabilities and have a question about the Ispell dictionary which I 
  cannot seem to find an answer to. It is probably a very simple issue, so 
  forgive my ignorance.
 
  In one article I am explaining about dictionaries and I have setup a 
  sample configuration which maps most token categories to only use a Ispell 
  dictionary (timusan_ispell) which has a default configuration:
 
  CREATE TEXT SEARCH DICTIONARY timusan_ispell (
  TEMPLATE = ispell,
  DictFile = en_us,
  AffFile = en_us,
  StopWords = english
  );
 
  When I run a simple query like SELECT 
  to_tsvector('timusan-ispell','smiling') I get back the following tsvector:
 
  'smile':1 'smiling':1
 
  As you can see I get two lexemes with the same pointer.
  The question here is: why does this happen?
 
  Is it normal behavior for the Ispell dictionary to emit multiple lexemes 
  for a single token? And if so, is this efficient? I mean, why could it not 
  simply save one lexeme 'smile' which (same as the snowball dictionary) 
  would match 'smiling' as well if later matched with the accompanying 
  tsquery?
 
  Thanks!
 
  Cheers,
  Tim
 
 
  --
  Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-general


 --
 Tim van der Linden t...@shisaa.jp


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Full text: Ispell dictionary

2014-05-02 Thread Tim van der Linden
Hi Oleg

Haha, understood!

Thanks for helping me on this one.

Cheers
Tim

On May 3, 2014 7:24:08 AM GMT+09:00, Oleg Bartunov obartu...@gmail.com wrote:
Tim,

you did answer yourself - don't use ispell :)

On Sat, May 3, 2014 at 1:45 AM, Tim van der Linden t...@shisaa.jp
wrote:
 On Fri, 2 May 2014 21:12:56 +0400
 Oleg Bartunov obartu...@gmail.com wrote:

 Hi Oleg

 Thanks for the response!

 Yes, it's normal for ispell dictionary, think about morphological
dictionary.

 Hmm, I see, that makes sense. I thought the morphological aspect of
the Ispell only dealt with splitting up compound words, but it also
deals with deriving the word to a more stem like form, correct?

 As a last question on this, is there a way to disable this dictionary
to emit multiple lexemes?

 The reason I am asking is because in my (fairly new) understanding of
PostgreSQL's full text it is always best to have as few lexemes as
possible saved in the vector. This to get smaller indexes and faster
matching afterwards. Also, if you run a tsquery afterwards to, you can
still employ the power of these multiple lexemes to find a match.

 Or...probably answering my own question...if I do not desire this
behavior I should maybe not use Ispell and simply use another
dictionary :)

 Thanks again.

 Cheers,
 Tim

 On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden t...@shisaa.jp
wrote:
  Good morning/afternoon all
 
  I am currently writing a few articles about PostgreSQL's full text
capabilities and have a question about the Ispell dictionary which I
cannot seem to find an answer to. It is probably a very simple issue,
so forgive my ignorance.
 
  In one article I am explaining about dictionaries and I have setup
a sample configuration which maps most token categories to only use a
Ispell dictionary (timusan_ispell) which has a default configuration:
 
  CREATE TEXT SEARCH DICTIONARY timusan_ispell (
  TEMPLATE = ispell,
  DictFile = en_us,
  AffFile = en_us,
  StopWords = english
  );
 
  When I run a simple query like SELECT
to_tsvector('timusan-ispell','smiling') I get back the following
tsvector:
 
  'smile':1 'smiling':1
 
  As you can see I get two lexemes with the same pointer.
  The question here is: why does this happen?
 
  Is it normal behavior for the Ispell dictionary to emit multiple
lexemes for a single token? And if so, is this efficient? I mean, why
could it not simply save one lexeme 'smile' which (same as the snowball
dictionary) would match 'smiling' as well if later matched with the
accompanying tsquery?
 
  Thanks!
 
  Cheers,
  Tim
 
 
  --
  Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-general


 --
 Tim van der Linden t...@shisaa.jp