Re: [GENERAL] to_tsvector in 8.2.3

2007-04-02 Thread richardcraig

Teodor

As a non-C windows user (yes - throw stones at me :) ) Do you have a fixed
dll for this patch that I can try?

Thanks

Richard


Teodor Sigaev-2 wrote:
 
 Solved, see attached patch. I had found old Celeron-300 box and install
 Windows 
 on it, and it was very slow :)
 
 
 Nope, same result with this patch.
 Thank you.
 
 -- 
 Teodor Sigaev   E-mail: [EMAIL PROTECTED]
 WWW:
 http://www.sigaev.ru/
 
 *** ./contrib/tsearch2.orig/./wordparser/parser.c Thu Mar 22 18:39:23 2007
 --- ./contrib/tsearch2/./wordparser/parser.c  Thu Mar 22 18:51:23 2007
 ***
 *** 117,123 
   {
   if (lc_ctype_is_c())
   {
 ! unsigned int c = *(unsigned int*)(prs-wstr + 
 prs-state-poschar);
   
   /*
* any non-ascii symbol with multibyte encoding
 --- 117,123 
   {
   if (lc_ctype_is_c())
   {
 ! unsigned int c = *(prs-wstr + prs-state-poschar);
   
   /*
* any non-ascii symbol with multibyte encoding
 
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq
 
 

-- 
View this message in context: 
http://www.nabble.com/to_tsvector-in-8.2.3-tf3440132.html#a9790691
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] to_tsvector in 8.2.3

2007-04-02 Thread Teodor Sigaev

Sorry, no - I tested on CVS HEAD, so dll isn't compatible :(
Wait a bit for 8.2.4

richardcraig wrote:

Teodor

As a non-C windows user (yes - throw stones at me :) ) Do you have a fixed
dll for this patch that I can try?

Thanks

Richard


Teodor Sigaev-2 wrote:

Solved, see attached patch. I had found old Celeron-300 box and install
Windows 
on it, and it was very slow :)




Nope, same result with this patch.

Thank you.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
WWW:
http://www.sigaev.ru/

*** ./contrib/tsearch2.orig/./wordparser/parser.c   Thu Mar 22 18:39:23 2007
--- ./contrib/tsearch2/./wordparser/parser.cThu Mar 22 18:51:23 2007
***
*** 117,123 
{
if (lc_ctype_is_c())
{
!   unsigned int c = *(unsigned int*)(prs-wstr + 
prs-state-poschar);
  
  			/*

 * any non-ascii symbol with multibyte encoding
--- 117,123 
{
if (lc_ctype_is_c())
{
!   unsigned int c = *(prs-wstr + prs-state-poschar);
  
  			/*

 * any non-ascii symbol with multibyte encoding


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq






--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] to_tsvector in 8.2.3

2007-03-22 Thread Magnus Hagander
On Wed, Mar 21, 2007 at 09:13:55PM +0300, Teodor Sigaev wrote:
 postgres=# select to_tsvector('test text');
   to_tsvector
 ---
  'test text':1
 (1 row)
 Ok. that's related to 
 http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/tsearch2/wordparser/parser.c.diff?r1=1.11;r2=1.12;f=h
 commit. Thomas pointed that it can be non-breakable space (0xa0) and that 
 commit assumes any character with C locale and multibyte encoding and  
 0x7f is alpha.
 To check theory, pls, apply attached patch.
 
 If so, I'm confused, we can not assume that 0xa0 is a space symbol in any 
 multibyte encoding, even in Windows.

Nope, same result with this patch.

//Magnus


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] to_tsvector in 8.2.3

2007-03-22 Thread Teodor Sigaev
Solved, see attached patch. I had found old Celeron-300 box and install Windows 
on it, and it was very slow :)




Nope, same result with this patch.

Thank you.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/
*** ./contrib/tsearch2.orig/./wordparser/parser.c   Thu Mar 22 18:39:23 2007
--- ./contrib/tsearch2/./wordparser/parser.cThu Mar 22 18:51:23 2007
***
*** 117,123 
{
if (lc_ctype_is_c())
{
!   unsigned int c = *(unsigned int*)(prs-wstr + 
prs-state-poschar);
  
/*
 * any non-ascii symbol with multibyte encoding
--- 117,123 
{
if (lc_ctype_is_c())
{
!   unsigned int c = *(prs-wstr + prs-state-poschar);
  
/*
 * any non-ascii symbol with multibyte encoding

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] to_tsvector in 8.2.3

2007-03-21 Thread Oleg Bartunov

What parse returns ? 8.1.5 and 8.3 return

www=# select parse('test text');
  parse
--
 (1,test)
 (12, )
 (1,text)
(3 rows)

Also, what is your configuration ?


On Wed, 21 Mar 2007, richardcraig wrote:



It may solve my query if anybody can tell me if anything has changes in
tsearch2 recently? otherwise...

I have two installations of Postgres on Windows machines, one is 8.2.1 and
the other is 8.2.3

Both installed the same way, selecting tsearch2 during the installation and
restoring the same backup database file.

In 8.2.1
 select to_tsvector('test text')
returns
 'test':1 'text':2
as expected

In 8.2.3 the same code returns
 'test text':1

Can anybody help?



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] to_tsvector in 8.2.3

2007-03-21 Thread richardcraig

Oleg

Thanks for the quick response.

On 8.2.1 I get the same result as yourself, however on 8.2.3 I get 
(2,test text)

Configuration?
Both databases are UTF8 encoded with language set to C
Standard windows installation except for tsearch2
I also have another machine with SQL_ASCII encoding on 8.2.1 and that's OK

As far as I can tell they are identical except for the postgresql versions.

It's been a long time since I did much with tsearch2 so I'm suspecting a
config problem with it that's I've long forgotten about.

Richard


Oleg Bartunov wrote:
 
 What parse returns ? 8.1.5 and 8.3 return
 
 www=# select parse('test text');
parse
 --
   (1,test)
   (12, )
   (1,text)
 (3 rows)
 
 Also, what is your configuration ?
 
 
 On Wed, 21 Mar 2007, richardcraig wrote:
 

 It may solve my query if anybody can tell me if anything has changes in
 tsearch2 recently? otherwise...

 I have two installations of Postgres on Windows machines, one is 8.2.1
 and
 the other is 8.2.3

 Both installed the same way, selecting tsearch2 during the installation
 and
 restoring the same backup database file.

 In 8.2.1
  select to_tsvector('test text')
 returns
  'test':1 'text':2
 as expected

 In 8.2.3 the same code returns
  'test text':1

 Can anybody help?

 
   Regards,
   Oleg
 _
 Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
 Sternberg Astronomical Institute, Moscow University, Russia
 Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
 phone: +007(495)939-16-83, +007(495)939-23-83
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
 
 

-- 
View this message in context: 
http://www.nabble.com/to_tsvector-in-8.2.3-tf3440132.html#a9592631
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] to_tsvector in 8.2.3

2007-03-21 Thread richardcraig

As an update, I've just copied the tsvector.dll file from an 8.2.1 computer
to the 8.2.3 computer and it's working OK now. I suspect a change in the dll
is causing the problem.

Richard



richardcraig wrote:
 
 Oleg
 
 Thanks for the quick response.
 
 On 8.2.1 I get the same result as yourself, however on 8.2.3 I get 
 (2,test text)
 
 Configuration?
 Both databases are UTF8 encoded with language set to C
 Standard windows installation except for tsearch2
 I also have another machine with SQL_ASCII encoding on 8.2.1 and that's OK
 
 As far as I can tell they are identical except for the postgresql
 versions.
 
 It's been a long time since I did much with tsearch2 so I'm suspecting a
 config problem with it that's I've long forgotten about.
 
 Richard
 
 
 Oleg Bartunov wrote:
 
 What parse returns ? 8.1.5 and 8.3 return
 
 www=# select parse('test text');
parse
 --
   (1,test)
   (12, )
   (1,text)
 (3 rows)
 
 Also, what is your configuration ?
 
 
 On Wed, 21 Mar 2007, richardcraig wrote:
 

 It may solve my query if anybody can tell me if anything has changes in
 tsearch2 recently? otherwise...

 I have two installations of Postgres on Windows machines, one is 8.2.1
 and
 the other is 8.2.3

 Both installed the same way, selecting tsearch2 during the installation
 and
 restoring the same backup database file.

 In 8.2.1
  select to_tsvector('test text')
 returns
  'test':1 'text':2
 as expected

 In 8.2.3 the same code returns
  'test text':1

 Can anybody help?

 
  Regards,
  Oleg
 _
 Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
 Sternberg Astronomical Institute, Moscow University, Russia
 Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
 phone: +007(495)939-16-83, +007(495)939-23-83
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/to_tsvector-in-8.2.3-tf3440132.html#a9592812
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] to_tsvector in 8.2.3

2007-03-21 Thread Teodor Sigaev
I can't reproduce your problem, but I have not Windows box, can anybody 
reproduce that?



contrib_regression=# select version();
 version 

 PostgreSQL 8.2.3 on i386-unknown-freebsd6.2, compiled by GCC gcc (GCC) 3.4.6 
[FreeBSD] 20060305

(1 row)
contrib_regression=#  show server_encoding ;
 server_encoding
-
 UTF8
(1 row)

contrib_regression=# show lc_collate;
 lc_collate

 C
(1 row)

contrib_regression=# show lc_ctype;
 lc_ctype
--
 C
(1 row)

contrib_regression=# select to_tsvector('test text');
to_tsvector
---
 'test':1 'text':2
(1 row)

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] to_tsvector in 8.2.3

2007-03-21 Thread Thomas Pundt
On Wednesday 21 March 2007 14:25, Teodor Sigaev wrote:
| I can't reproduce your problem, but I have not Windows box, can anybody
| reproduce that?

just a guess in the wild; I once had a similar phenomen and tracked it down
to a non breaking space character (0xA0). Since then I'm patching the
tsearch2 lexer:

--- postgresql-8.1.5/contrib/tsearch2/wordparser/parser.l
+++ postgresql-8.1.4/contrib/tsearch2/wordparser/parser.l
@@ -78,8 +78,8 @@
 /* cyrillic koi8 char */
 CYRALNUM   [0-9\200-\377]
 CYRALPHA   [\200-\377]
-ALPHA  [a-zA-Z\200-\377]
-ALNUM  [0-9a-zA-Z\200-\377]
+ALPHA  [a-zA-Z\200-\237\241-\377]
+ALNUM  [0-9a-zA-Z\200-\237\241-\377]
 
 
 HOSTNAME   ([-_[:alnum:]]+\.)+[[:alpha:]]+
@@ -307,7 +307,7 @@
return UWORD; 
 }
 
-[ \r\n\t]+ {
+[ \240\r\n\t]+ {
token = tsearch2_yytext;
tokenlen = tsearch2_yyleng;
return SPACE;


Ciao,
Thomas

-- 
Thomas Pundt [EMAIL PROTECTED]  http://rp-online.de/ 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] to_tsvector in 8.2.3

2007-03-21 Thread Teodor Sigaev

8.2 has fully rewritten text parser based on POSIX is* functions.

Thomas Pundt wrote:

On Wednesday 21 March 2007 14:25, Teodor Sigaev wrote:
| I can't reproduce your problem, but I have not Windows box, can anybody
| reproduce that?

just a guess in the wild; I once had a similar phenomen and tracked it down
to a non breaking space character (0xA0). Since then I'm patching the
tsearch2 lexer:

--- postgresql-8.1.5/contrib/tsearch2/wordparser/parser.l
+++ postgresql-8.1.4/contrib/tsearch2/wordparser/parser.l
@@ -78,8 +78,8 @@
 /* cyrillic koi8 char */
 CYRALNUM   [0-9\200-\377]
 CYRALPHA   [\200-\377]
-ALPHA  [a-zA-Z\200-\377]
-ALNUM  [0-9a-zA-Z\200-\377]
+ALPHA  [a-zA-Z\200-\237\241-\377]
+ALNUM  [0-9a-zA-Z\200-\237\241-\377]
 
 
 HOSTNAME   ([-_[:alnum:]]+\.)+[[:alpha:]]+

@@ -307,7 +307,7 @@
return UWORD; 
 }
 
-[ \r\n\t]+ {

+[ \240\r\n\t]+ {
token = tsearch2_yytext;
tokenlen = tsearch2_yyleng;
return SPACE;


Ciao,
Thomas



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] to_tsvector in 8.2.3

2007-03-21 Thread Oleg Bartunov

On Wed, 21 Mar 2007, richardcraig wrote:



As an update, I've just copied the tsvector.dll file from an 8.2.1 computer
to the 8.2.3 computer and it's working OK now. I suspect a change in the dll
is causing the problem.


Interesting.



Richard



richardcraig wrote:


Oleg

Thanks for the quick response.

On 8.2.1 I get the same result as yourself, however on 8.2.3 I get
(2,test text)

Configuration?
Both databases are UTF8 encoded with language set to C
Standard windows installation except for tsearch2
I also have another machine with SQL_ASCII encoding on 8.2.1 and that's OK

As far as I can tell they are identical except for the postgresql
versions.

It's been a long time since I did much with tsearch2 so I'm suspecting a
config problem with it that's I've long forgotten about.

Richard


Oleg Bartunov wrote:


What parse returns ? 8.1.5 and 8.3 return

www=# select parse('test text');
   parse
--
  (1,test)
  (12, )
  (1,text)
(3 rows)

Also, what is your configuration ?


On Wed, 21 Mar 2007, richardcraig wrote:



It may solve my query if anybody can tell me if anything has changes in
tsearch2 recently? otherwise...

I have two installations of Postgres on Windows machines, one is 8.2.1
and
the other is 8.2.3

Both installed the same way, selecting tsearch2 during the installation
and
restoring the same backup database file.

In 8.2.1
 select to_tsvector('test text')
returns
 'test':1 'text':2
as expected

In 8.2.3 the same code returns
 'test text':1

Can anybody help?



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match










Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] to_tsvector in 8.2.3

2007-03-21 Thread Magnus Hagander
On Wed, Mar 21, 2007 at 04:25:30PM +0300, Teodor Sigaev wrote:
 I can't reproduce your problem, but I have not Windows box, can anybody 
 reproduce that?
 
 
 contrib_regression=# select version();
  version 
 
  PostgreSQL 8.2.3 on i386-unknown-freebsd6.2, compiled by GCC gcc (GCC) 
  3.4.6 [FreeBSD] 20060305
 (1 row)
 contrib_regression=#  show server_encoding ;
  server_encoding
 -
  UTF8
 (1 row)
 
 contrib_regression=# show lc_collate;
  lc_collate
 
  C
 (1 row)
 
 contrib_regression=# show lc_ctype;
  lc_ctype
 --
  C
 (1 row)
 
 contrib_regression=# select to_tsvector('test text');
 to_tsvector
 ---
  'test':1 'text':2
 (1 row)

postgres=# select version();show lc_collate;show lc_ctype;
   version


-
 PostgreSQL 8.3devel on i686-pc-mingw32, compiled by GCC gcc.exe (GCC)
3.4.2 (mi
ngw-special)
(1 row)

 lc_collate

 C
(1 row)

 lc_ctype
--
 C
(1 row)

postgres=# select to_tsvector('test text');
  to_tsvector
---
 'test text':1
(1 row)


(encoding is UTF8 on a fresh database)

//Magnus


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] to_tsvector in 8.2.3

2007-03-21 Thread Teodor Sigaev

postgres=# select to_tsvector('test text');
  to_tsvector
---
 'test text':1
(1 row)
Ok. that's related to 
http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/tsearch2/wordparser/parser.c.diff?r1=1.11;r2=1.12;f=h
commit. Thomas pointed that it can be non-breakable space (0xa0) and that commit 
assumes any character with C locale and multibyte encoding and  0x7f is alpha.

To check theory, pls, apply attached patch.

If so, I'm confused, we can not assume that 0xa0 is a space symbol in any 
multibyte encoding, even in Windows.




--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/
*** ./contrib/tsearch2/wordparser/parser.c.orig Wed Mar 21 20:41:23 2007
--- ./contrib/tsearch2/wordparser/parser.c  Wed Mar 21 21:10:39 2007
***
*** 124,130 
--- 124,134 
 * with C-locale is an alpha character
 */
if ( c  0x7f )
+   {
+   if ( c == 0xa0 )
+   return 0;
return 1;
+   }
  
return isalnum(0xff  c);
}
***
*** 157,163 
--- 161,171 
 * with C-locale is an alpha character
 */
if ( c  0x7f )
+   {
+   if ( c == 0xa0 )
+   return 0;
return 1;
+   }
  
return isalpha(0xff  c);
}

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster