Re: [GENERAL] to_tsvector in 8.2.3
Teodor As a non-C windows user (yes - throw stones at me :) ) Do you have a fixed dll for this patch that I can try? Thanks Richard Teodor Sigaev-2 wrote: Solved, see attached patch. I had found old Celeron-300 box and install Windows on it, and it was very slow :) Nope, same result with this patch. Thank you. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ *** ./contrib/tsearch2.orig/./wordparser/parser.c Thu Mar 22 18:39:23 2007 --- ./contrib/tsearch2/./wordparser/parser.c Thu Mar 22 18:51:23 2007 *** *** 117,123 { if (lc_ctype_is_c()) { ! unsigned int c = *(unsigned int*)(prs-wstr + prs-state-poschar); /* * any non-ascii symbol with multibyte encoding --- 117,123 { if (lc_ctype_is_c()) { ! unsigned int c = *(prs-wstr + prs-state-poschar); /* * any non-ascii symbol with multibyte encoding ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- View this message in context: http://www.nabble.com/to_tsvector-in-8.2.3-tf3440132.html#a9790691 Sent from the PostgreSQL - general mailing list archive at Nabble.com. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] to_tsvector in 8.2.3
Sorry, no - I tested on CVS HEAD, so dll isn't compatible :( Wait a bit for 8.2.4 richardcraig wrote: Teodor As a non-C windows user (yes - throw stones at me :) ) Do you have a fixed dll for this patch that I can try? Thanks Richard Teodor Sigaev-2 wrote: Solved, see attached patch. I had found old Celeron-300 box and install Windows on it, and it was very slow :) Nope, same result with this patch. Thank you. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ *** ./contrib/tsearch2.orig/./wordparser/parser.c Thu Mar 22 18:39:23 2007 --- ./contrib/tsearch2/./wordparser/parser.cThu Mar 22 18:51:23 2007 *** *** 117,123 { if (lc_ctype_is_c()) { ! unsigned int c = *(unsigned int*)(prs-wstr + prs-state-poschar); /* * any non-ascii symbol with multibyte encoding --- 117,123 { if (lc_ctype_is_c()) { ! unsigned int c = *(prs-wstr + prs-state-poschar); /* * any non-ascii symbol with multibyte encoding ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] to_tsvector in 8.2.3
On Wed, Mar 21, 2007 at 09:13:55PM +0300, Teodor Sigaev wrote: postgres=# select to_tsvector('test text'); to_tsvector --- 'test text':1 (1 row) Ok. that's related to http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/tsearch2/wordparser/parser.c.diff?r1=1.11;r2=1.12;f=h commit. Thomas pointed that it can be non-breakable space (0xa0) and that commit assumes any character with C locale and multibyte encoding and 0x7f is alpha. To check theory, pls, apply attached patch. If so, I'm confused, we can not assume that 0xa0 is a space symbol in any multibyte encoding, even in Windows. Nope, same result with this patch. //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] to_tsvector in 8.2.3
Solved, see attached patch. I had found old Celeron-300 box and install Windows on it, and it was very slow :) Nope, same result with this patch. Thank you. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ *** ./contrib/tsearch2.orig/./wordparser/parser.c Thu Mar 22 18:39:23 2007 --- ./contrib/tsearch2/./wordparser/parser.cThu Mar 22 18:51:23 2007 *** *** 117,123 { if (lc_ctype_is_c()) { ! unsigned int c = *(unsigned int*)(prs-wstr + prs-state-poschar); /* * any non-ascii symbol with multibyte encoding --- 117,123 { if (lc_ctype_is_c()) { ! unsigned int c = *(prs-wstr + prs-state-poschar); /* * any non-ascii symbol with multibyte encoding ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] to_tsvector in 8.2.3
What parse returns ? 8.1.5 and 8.3 return www=# select parse('test text'); parse -- (1,test) (12, ) (1,text) (3 rows) Also, what is your configuration ? On Wed, 21 Mar 2007, richardcraig wrote: It may solve my query if anybody can tell me if anything has changes in tsearch2 recently? otherwise... I have two installations of Postgres on Windows machines, one is 8.2.1 and the other is 8.2.3 Both installed the same way, selecting tsearch2 during the installation and restoring the same backup database file. In 8.2.1 select to_tsvector('test text') returns 'test':1 'text':2 as expected In 8.2.3 the same code returns 'test text':1 Can anybody help? Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] to_tsvector in 8.2.3
Oleg Thanks for the quick response. On 8.2.1 I get the same result as yourself, however on 8.2.3 I get (2,test text) Configuration? Both databases are UTF8 encoded with language set to C Standard windows installation except for tsearch2 I also have another machine with SQL_ASCII encoding on 8.2.1 and that's OK As far as I can tell they are identical except for the postgresql versions. It's been a long time since I did much with tsearch2 so I'm suspecting a config problem with it that's I've long forgotten about. Richard Oleg Bartunov wrote: What parse returns ? 8.1.5 and 8.3 return www=# select parse('test text'); parse -- (1,test) (12, ) (1,text) (3 rows) Also, what is your configuration ? On Wed, 21 Mar 2007, richardcraig wrote: It may solve my query if anybody can tell me if anything has changes in tsearch2 recently? otherwise... I have two installations of Postgres on Windows machines, one is 8.2.1 and the other is 8.2.3 Both installed the same way, selecting tsearch2 during the installation and restoring the same backup database file. In 8.2.1 select to_tsvector('test text') returns 'test':1 'text':2 as expected In 8.2.3 the same code returns 'test text':1 Can anybody help? Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- View this message in context: http://www.nabble.com/to_tsvector-in-8.2.3-tf3440132.html#a9592631 Sent from the PostgreSQL - general mailing list archive at Nabble.com. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] to_tsvector in 8.2.3
As an update, I've just copied the tsvector.dll file from an 8.2.1 computer to the 8.2.3 computer and it's working OK now. I suspect a change in the dll is causing the problem. Richard richardcraig wrote: Oleg Thanks for the quick response. On 8.2.1 I get the same result as yourself, however on 8.2.3 I get (2,test text) Configuration? Both databases are UTF8 encoded with language set to C Standard windows installation except for tsearch2 I also have another machine with SQL_ASCII encoding on 8.2.1 and that's OK As far as I can tell they are identical except for the postgresql versions. It's been a long time since I did much with tsearch2 so I'm suspecting a config problem with it that's I've long forgotten about. Richard Oleg Bartunov wrote: What parse returns ? 8.1.5 and 8.3 return www=# select parse('test text'); parse -- (1,test) (12, ) (1,text) (3 rows) Also, what is your configuration ? On Wed, 21 Mar 2007, richardcraig wrote: It may solve my query if anybody can tell me if anything has changes in tsearch2 recently? otherwise... I have two installations of Postgres on Windows machines, one is 8.2.1 and the other is 8.2.3 Both installed the same way, selecting tsearch2 during the installation and restoring the same backup database file. In 8.2.1 select to_tsvector('test text') returns 'test':1 'text':2 as expected In 8.2.3 the same code returns 'test text':1 Can anybody help? Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- View this message in context: http://www.nabble.com/to_tsvector-in-8.2.3-tf3440132.html#a9592812 Sent from the PostgreSQL - general mailing list archive at Nabble.com. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] to_tsvector in 8.2.3
I can't reproduce your problem, but I have not Windows box, can anybody reproduce that? contrib_regression=# select version(); version PostgreSQL 8.2.3 on i386-unknown-freebsd6.2, compiled by GCC gcc (GCC) 3.4.6 [FreeBSD] 20060305 (1 row) contrib_regression=# show server_encoding ; server_encoding - UTF8 (1 row) contrib_regression=# show lc_collate; lc_collate C (1 row) contrib_regression=# show lc_ctype; lc_ctype -- C (1 row) contrib_regression=# select to_tsvector('test text'); to_tsvector --- 'test':1 'text':2 (1 row) -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] to_tsvector in 8.2.3
On Wednesday 21 March 2007 14:25, Teodor Sigaev wrote: | I can't reproduce your problem, but I have not Windows box, can anybody | reproduce that? just a guess in the wild; I once had a similar phenomen and tracked it down to a non breaking space character (0xA0). Since then I'm patching the tsearch2 lexer: --- postgresql-8.1.5/contrib/tsearch2/wordparser/parser.l +++ postgresql-8.1.4/contrib/tsearch2/wordparser/parser.l @@ -78,8 +78,8 @@ /* cyrillic koi8 char */ CYRALNUM [0-9\200-\377] CYRALPHA [\200-\377] -ALPHA [a-zA-Z\200-\377] -ALNUM [0-9a-zA-Z\200-\377] +ALPHA [a-zA-Z\200-\237\241-\377] +ALNUM [0-9a-zA-Z\200-\237\241-\377] HOSTNAME ([-_[:alnum:]]+\.)+[[:alpha:]]+ @@ -307,7 +307,7 @@ return UWORD; } -[ \r\n\t]+ { +[ \240\r\n\t]+ { token = tsearch2_yytext; tokenlen = tsearch2_yyleng; return SPACE; Ciao, Thomas -- Thomas Pundt [EMAIL PROTECTED] http://rp-online.de/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] to_tsvector in 8.2.3
8.2 has fully rewritten text parser based on POSIX is* functions. Thomas Pundt wrote: On Wednesday 21 March 2007 14:25, Teodor Sigaev wrote: | I can't reproduce your problem, but I have not Windows box, can anybody | reproduce that? just a guess in the wild; I once had a similar phenomen and tracked it down to a non breaking space character (0xA0). Since then I'm patching the tsearch2 lexer: --- postgresql-8.1.5/contrib/tsearch2/wordparser/parser.l +++ postgresql-8.1.4/contrib/tsearch2/wordparser/parser.l @@ -78,8 +78,8 @@ /* cyrillic koi8 char */ CYRALNUM [0-9\200-\377] CYRALPHA [\200-\377] -ALPHA [a-zA-Z\200-\377] -ALNUM [0-9a-zA-Z\200-\377] +ALPHA [a-zA-Z\200-\237\241-\377] +ALNUM [0-9a-zA-Z\200-\237\241-\377] HOSTNAME ([-_[:alnum:]]+\.)+[[:alpha:]]+ @@ -307,7 +307,7 @@ return UWORD; } -[ \r\n\t]+ { +[ \240\r\n\t]+ { token = tsearch2_yytext; tokenlen = tsearch2_yyleng; return SPACE; Ciao, Thomas -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] to_tsvector in 8.2.3
On Wed, 21 Mar 2007, richardcraig wrote: As an update, I've just copied the tsvector.dll file from an 8.2.1 computer to the 8.2.3 computer and it's working OK now. I suspect a change in the dll is causing the problem. Interesting. Richard richardcraig wrote: Oleg Thanks for the quick response. On 8.2.1 I get the same result as yourself, however on 8.2.3 I get (2,test text) Configuration? Both databases are UTF8 encoded with language set to C Standard windows installation except for tsearch2 I also have another machine with SQL_ASCII encoding on 8.2.1 and that's OK As far as I can tell they are identical except for the postgresql versions. It's been a long time since I did much with tsearch2 so I'm suspecting a config problem with it that's I've long forgotten about. Richard Oleg Bartunov wrote: What parse returns ? 8.1.5 and 8.3 return www=# select parse('test text'); parse -- (1,test) (12, ) (1,text) (3 rows) Also, what is your configuration ? On Wed, 21 Mar 2007, richardcraig wrote: It may solve my query if anybody can tell me if anything has changes in tsearch2 recently? otherwise... I have two installations of Postgres on Windows machines, one is 8.2.1 and the other is 8.2.3 Both installed the same way, selecting tsearch2 during the installation and restoring the same backup database file. In 8.2.1 select to_tsvector('test text') returns 'test':1 'text':2 as expected In 8.2.3 the same code returns 'test text':1 Can anybody help? Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] to_tsvector in 8.2.3
On Wed, Mar 21, 2007 at 04:25:30PM +0300, Teodor Sigaev wrote: I can't reproduce your problem, but I have not Windows box, can anybody reproduce that? contrib_regression=# select version(); version PostgreSQL 8.2.3 on i386-unknown-freebsd6.2, compiled by GCC gcc (GCC) 3.4.6 [FreeBSD] 20060305 (1 row) contrib_regression=# show server_encoding ; server_encoding - UTF8 (1 row) contrib_regression=# show lc_collate; lc_collate C (1 row) contrib_regression=# show lc_ctype; lc_ctype -- C (1 row) contrib_regression=# select to_tsvector('test text'); to_tsvector --- 'test':1 'text':2 (1 row) postgres=# select version();show lc_collate;show lc_ctype; version - PostgreSQL 8.3devel on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 (mi ngw-special) (1 row) lc_collate C (1 row) lc_ctype -- C (1 row) postgres=# select to_tsvector('test text'); to_tsvector --- 'test text':1 (1 row) (encoding is UTF8 on a fresh database) //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] to_tsvector in 8.2.3
postgres=# select to_tsvector('test text'); to_tsvector --- 'test text':1 (1 row) Ok. that's related to http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/tsearch2/wordparser/parser.c.diff?r1=1.11;r2=1.12;f=h commit. Thomas pointed that it can be non-breakable space (0xa0) and that commit assumes any character with C locale and multibyte encoding and 0x7f is alpha. To check theory, pls, apply attached patch. If so, I'm confused, we can not assume that 0xa0 is a space symbol in any multibyte encoding, even in Windows. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ *** ./contrib/tsearch2/wordparser/parser.c.orig Wed Mar 21 20:41:23 2007 --- ./contrib/tsearch2/wordparser/parser.c Wed Mar 21 21:10:39 2007 *** *** 124,130 --- 124,134 * with C-locale is an alpha character */ if ( c 0x7f ) + { + if ( c == 0xa0 ) + return 0; return 1; + } return isalnum(0xff c); } *** *** 157,163 --- 161,171 * with C-locale is an alpha character */ if ( c 0x7f ) + { + if ( c == 0xa0 ) + return 0; return 1; + } return isalpha(0xff c); } ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster