[GENERAL] FTS wildcard and custom ispell dictionary problem

2010-08-19 Thread darklow
Hello,

I am using PostgreSQL 8.4 full text search in following way:

Custom FTS configuration called dc2 with these dictionaries in
following order for asciihword token: latvian_ispell, english_stem,
russian_stem

Latvian ispell dictionary contains words with different endings but
same meaning (latvian langiage specifics, plural words, etc)
The problem starts when using wildcard :* to_tsquery syntax.

For example. If i look for the word kriev i am automatically adding
wildcard using syntax:  to_tsquery('dc2', 'kriev:*');

By searching kriev:* FTS founds word krievs in latvian_ispell
dictionary which is totally ok.

 SELECT * from ts_debug('dc2', 'kriev:*');
  alias   |   description   | token | dictionaries
  |   dictionary   | lexemes
---+-+---+--++--
 asciiword | Word, all ASCII | kriev |
{latvian_ispell,english_stem,russian_stem} | latvian_ispell | {krievs}
 blank | Space symbols   | :*| {}

If understand correctly now database uses not kriev:* but krievs:* for
following queries.

And here is the problem, data contains also word: Krievija, and in
this case search doesn't find it, because now it looks for Krievs:*
and not Kriev:* anymore.

Is there any solution anone could suggest to get results by both
criterias - kriev:* (starting query) and krievs:* (founded in ispell
dict).

Only idea i had is to somehow combine two tsqueries one -
to_tsquery('dc2', 'kriev:*') and to_tsquery('english', 'kriev:*'); so
the search looks for both - kriev:* and krievs:* but anyway didnt
figured out any syntax i could use :(

Thanks

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] FTS wildcard and custom ispell dictionary problem

2010-08-19 Thread Oleg Bartunov

On Thu, 19 Aug 2010, darklow wrote:


Hello,

I am using PostgreSQL 8.4 full text search in following way:

Custom FTS configuration called dc2 with these dictionaries in
following order for asciihword token: latvian_ispell, english_stem,
russian_stem


russian_stem will never called ! Sorry, I have no time right now, will 
be back later.




Latvian ispell dictionary contains words with different endings but
same meaning (latvian langiage specifics, plural words, etc)
The problem starts when using wildcard :* to_tsquery syntax.

For example. If i look for the word kriev i am automatically adding
wildcard using syntax:  to_tsquery('dc2', 'kriev:*');

By searching kriev:* FTS founds word krievs in latvian_ispell
dictionary which is totally ok.

SELECT * from ts_debug('dc2', 'kriev:*');
 alias   |   description   | token | dictionaries
 |   dictionary   | lexemes
---+-+---+--++--
asciiword | Word, all ASCII | kriev |
{latvian_ispell,english_stem,russian_stem} | latvian_ispell | {krievs}
blank | Space symbols   | :*| {}

If understand correctly now database uses not kriev:* but krievs:* for
following queries.

And here is the problem, data contains also word: Krievija, and in
this case search doesn't find it, because now it looks for Krievs:*
and not Kriev:* anymore.

Is there any solution anone could suggest to get results by both
criterias - kriev:* (starting query) and krievs:* (founded in ispell
dict).

Only idea i had is to somehow combine two tsqueries one -
to_tsquery('dc2', 'kriev:*') and to_tsquery('english', 'kriev:*'); so
the search looks for both - kriev:* and krievs:* but anyway didnt
figured out any syntax i could use :(

Thanks




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] FTS wildcard and custom ispell dictionary problem

2010-08-19 Thread Oleg Bartunov

On Thu, 19 Aug 2010, darklow wrote:


Hello,

I am using PostgreSQL 8.4 full text search in following way:

Custom FTS configuration called dc2 with these dictionaries in
following order for asciihword token: latvian_ispell, english_stem,
russian_stem

Latvian ispell dictionary contains words with different endings but
same meaning (latvian langiage specifics, plural words, etc)
The problem starts when using wildcard :* to_tsquery syntax.

For example. If i look for the word kriev i am automatically adding
wildcard using syntax:  to_tsquery('dc2', 'kriev:*');

By searching kriev:* FTS founds word krievs in latvian_ispell
dictionary which is totally ok.

SELECT * from ts_debug('dc2', 'kriev:*');
 alias   |   description   | token | dictionaries
 |   dictionary   | lexemes
---+-+---+--++--
asciiword | Word, all ASCII | kriev |
{latvian_ispell,english_stem,russian_stem} | latvian_ispell | {krievs}
blank | Space symbols   | :*| {}

If understand correctly now database uses not kriev:* but krievs:* for
following queries.

And here is the problem, data contains also word: Krievija, and in
this case search doesn't find it, because now it looks for Krievs:*
and not Kriev:* anymore.

Is there any solution anone could suggest to get results by both
criterias - kriev:* (starting query) and krievs:* (founded in ispell
dict).

Only idea i had is to somehow combine two tsqueries one -
to_tsquery('dc2', 'kriev:*') and to_tsquery('english', 'kriev:*'); so
the search looks for both - kriev:* and krievs:* but anyway didnt
figured out any syntax i could use :(


select to_tsquery('dc2', 'kriev:*') || to_tsquery('english', 'kriev:*');



Thanks




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] FTS wildcard and custom ispell dictionary problem

2010-08-19 Thread darklow
Great thanks. This function idea works perfectly!

create function xtsquery (tsquery,tsquery) returns tsquery as $$
select $1 || $2 $$ language sql;
SELECT fts_title, ts_headline(fts_title, query) from object_fts ot,
xtsquery(to_tsquery('dc2', 'kriev:*'), to_tsquery('english',
'kriev:*')) query where fts_tsv @@ query;



2010/8/19 Oleg Bartunov o...@sai.msu.su:
 Write function like:

 test=# create function xtsquery (tsquery,tsquery) returns tsquery as $$
 select $1 || $2 $$ language sql;

 test=# select   query from   xtsquery(to_tsquery('aad:*'),
 to_tsquery('aads:*'))  as query where 'aad:1 aadsj:2'::tsvector @@  query;
       query ---
  'aad':* | 'aad':*
 (1 row)

 Oleg

 PS. Please, write back to mailing list, so other people will read it. I'm
 too busy to help individuals :)

 On Thu, 19 Aug 2010, darklow wrote:

 If i use this syntax, i cannot give tsquery name query to use in for
 example ts_headline and ts_rank_cd functions! :(
 This means i need to define ts_query concatenation every time, and i
 guess this can cause performance issues.

 This works, but concatenation and to_tsquery is called every time i
 need it (up to 3 times if i use also ts_rank_cd and ts_headline
 functions):
 SELECT fts_title, ts_headline(fts_title, (to_tsquery('dc2', 'kriev:*')
 || to_tsquery('english', 'kriev:*'))) from object_fts ot where fts_tsv
 @@ (to_tsquery('dc2', 'kriev:*') || to_tsquery('english', 'kriev:*'));

 These unfortunately doesn't:
 SELECT fts_title, ts_headline(fts_title, query) from object_fts ot
 where fts_tsv @@ (to_tsquery('dc2', 'kriev:*') ||
 to_tsquery('english', 'kriev:*'));

 ERROR:  column query does not exist
 SELECT fts_title, ts_headline(fts_title, query) from object_fts ot,
 (to_tsquery('dc2', 'kriev:*') || to_tsquery('english', 'kriev:*'))
 query where fts_tsv @@ query;

 are there any solutions for this?
 Thanks.

 2010/8/19 Oleg Bartunov o...@sai.msu.su:

 test=# select 'aad:1 aadsj:2'::tsvector @@ (to_tsquery('aad:*') ||
  to_tsquery('aads:*'));
  ?column? --
  t
 (1 row)

 On Thu, 19 Aug 2010, darklow wrote:

 Thanks this solution could work for me. However i can't get to work
 syntax in more complex query, for example using @@ syntax:

 This one works:
 SELECT fts_title, ts_headline(fts_title, query) from object_fts ot,
 to_tsquery('dc2', 'kriev:*') query where query @@ fts_tsv;

 With concatenating doesn't work:
 SELECT fts_title, ts_headline(fts_title, query) from object_fts ot,
 (to_tsquery('dc2', 'kriev:*') || to_tsquery('english', 'kriev:*'))
 query where query @@ fts_tsv;
 ERROR:  syntax error at or near ||

 Any ideas what is correct syntax of this?
 Thanks.


 On Thu, Aug 19, 2010 at 12:43 PM, Oleg Bartunov o...@sai.msu.su wrote:

 On Thu, 19 Aug 2010, darklow wrote:

 Hello,

 I am using PostgreSQL 8.4 full text search in following way:

 Custom FTS configuration called dc2 with these dictionaries in
 following order for asciihword token: latvian_ispell, english_stem,
 russian_stem

 Latvian ispell dictionary contains words with different endings but
 same meaning (latvian langiage specifics, plural words, etc)
 The problem starts when using wildcard :* to_tsquery syntax.

 For example. If i look for the word kriev i am automatically adding
 wildcard using syntax:  to_tsquery('dc2', 'kriev:*');

 By searching kriev:* FTS founds word krievs in latvian_ispell
 dictionary which is totally ok.

 SELECT * from ts_debug('dc2', 'kriev:*');
  alias   |   description   | token | dictionaries
  |   dictionary   | lexemes



 ---+-+---+--++--
 asciiword | Word, all ASCII | kriev |
 {latvian_ispell,english_stem,russian_stem} | latvian_ispell | {krievs}
 blank     | Space symbols   | :*    | {}

 If understand correctly now database uses not kriev:* but krievs:* for
 following queries.

 And here is the problem, data contains also word: Krievija, and in
 this case search doesn't find it, because now it looks for Krievs:*
 and not Kriev:* anymore.

 Is there any solution anone could suggest to get results by both
 criterias - kriev:* (starting query) and krievs:* (founded in ispell
 dict).

 Only idea i had is to somehow combine two tsqueries one -
 to_tsquery('dc2', 'kriev:*') and to_tsquery('english', 'kriev:*'); so
 the search looks for both - kriev:* and krievs:* but anyway didnt
 figured out any syntax i could use :(

 select to_tsquery('dc2', 'kriev:*') || to_tsquery('english',
 'kriev:*');


 Thanks



        Regards,
                Oleg
 _
 Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
 Sternberg Astronomical Institute, Moscow University, Russia
 Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
 phone: +007(495)939-16-83, +007(495)939-23-83



        Regards,
                Oleg
 _
 Oleg