Re: [GENERAL] Incorrect FTS result with GIN index

2010-07-28 Thread Oleg Bartunov

Tom,

you can download dump http://mira.sai.msu.su/~megera/tmp/search_tab.dump

Oleg
On Tue, 27 Jul 2010, Tom Lane wrote:


Oleg Bartunov o...@sai.msu.su writes:

I recommend post your problem to -hackers mailing list. I have no idea,
what could be a problem.


I wonder whether the problem is not windows versus non windows but
original database versus copies.  If it is a GIN bug it seems quite
possible that it would depend on the order of insertion of the index
entries, which a simple dump-and-reload probably wouldn't duplicate.

If you were working from a dump it'd be easy to try creating the index
before populating the table to see if the bug can be reproduced then,
but there's no certainty that would provoke the bug.

The rest of us have not seen the dump data, so we have no hope of
doing anything with this report anyway.

regards, tom lane




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incorrect FTS result with GIN index

2010-07-27 Thread Tom Lane
Oleg Bartunov o...@sai.msu.su writes:
 I recommend post your problem to -hackers mailing list. I have no idea,
 what could be a problem.

I wonder whether the problem is not windows versus non windows but
original database versus copies.  If it is a GIN bug it seems quite
possible that it would depend on the order of insertion of the index
entries, which a simple dump-and-reload probably wouldn't duplicate.

If you were working from a dump it'd be easy to try creating the index
before populating the table to see if the bug can be reproduced then,
but there's no certainty that would provoke the bug.

The rest of us have not seen the dump data, so we have no hope of
doing anything with this report anyway.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incorrect FTS result with GIN index

2010-07-26 Thread Artur Dabrowski

Hello Oleg,

I totally agree, that the problem should be fixed. Saying this, I need to
add that:
- I have no knowledge of postgres development,
- I cannot dedicate any significant time to this problem,
- I am no longer working for the project where the problem occurred,
- In the mentioned project the problem is not considered business-critical
at the moment (although it may be in the future).

Nevertheless I think it should be still interesting for postgres developers
community to fix it. The point is I have no needed knowledge nor time to fix
it.

As to my postgres setup - it's nothing special, it's just a regular version
from postgres' webpage.

Best regards
Artur







Oleg Bartunov wrote:
 
 Artur,
 
 you could get much more problems in future. Full text search problem may
 be
 signature of more general problem with your postgres setup. So, I'd
 recommend
 to find a source of the problem
 
 
 Oleg
 

-- 
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Incorrect-FTS-results-with-GIN-index-tp1928607p2227845.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incorrect FTS result with GIN index

2010-07-24 Thread Oleg Bartunov

Artur,

you could get much more problems in future. Full text search problem may be
signature of more general problem with your postgres setup. So, I'd recommend
to find a source of the problem


Oleg
On Tue, 20 Jul 2010, Artur Dabrowski wrote:



Oleg,

thanks for your help.

I sent a post to pg-hackers list:
http://old.nabble.com/Query-results-differ-depending-on-operating-system-%28using-GIN%29-ts29213082.html

As to compiling pg... I will no do this since I do not really feel
comfortable doing it and cannot dedicate too much time to this problem.

Artur



Oleg Bartunov wrote:


Artur,

I recommend post your problem to -hackers mailing list. I have no idea,
what could be a problem.

My machine is:
uname -a
Linux mira 2.6.33-020633-generic #020633 SMP Thu Feb 25 10:10:03 UTC 2010
x86_64 GNU/Linux

PostgreSQL 8.4.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu
4.4.1-4ubuntu9) 4.4.1, 64-bit

As a last resort I recommend you to compile pg yourself and see if the
problem exists.

Oleg


On Tue, 20 Jul 2010, Artur Dabrowski wrote:



I tested the same backup on our CentOS 5.4 virtual machine (running on
xen
server) and the results are really weird (118 rows, comparing to 116 on
win
xp and 123 expected):

Aggregate  (cost=104.00..104.01 rows=1 width=0) (actual
time=120.373..120.374 rows=1 loops=1)
 -  Bitmap Heap Scan on search_tab  (cost=5.35..103.93 rows=25 width=0)
(actual time=59.418..120.137 rows=118 loops=1)
   Recheck Cond: ((to_tsvector('german'::regconfig, keywords) @@
'''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
'''dd'':*'::tsquery))
   -  Bitmap Index Scan on idx_keywords_ger  (cost=0.00..5.34
rows=25
width=0) (actual time=59.229..59.229 rows=495 loops=1)
 Index Cond: ((to_tsvector('german'::regconfig, keywords) @@
'''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
'''dd'':*'::tsquery))
Total runtime: 120.670 ms

And here are the configuration details:

PostgreSQL:
postgresql84-server-8.4.4-1.el5_5.1

# uname -r
2.6.18-164.15.1.el5xen

# cat /etc/redhat-release
CentOS release 5.4 (Final)

# cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Xeon(R) CPU5140  @ 2.33GHz
stepping: 6
cpu MHz : 2333.416
cache size  : 4096 KB
physical id : 0
siblings: 1
core id : 0
cpu cores   : 1
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu de tsc msr pae cx8 apic sep cmov pat clflush acpi
mmx
fxsr sse sse2 ss ht syscall lm constant_tsc pni cx16 lahf_lm
bogomips: 5835.83
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:







Oleg Bartunov wrote:


Artur,

I don't know, but could you try linux machine ?

Oleg






Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general







Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incorrect FTS result with GIN index

2010-07-20 Thread Artur Dabrowski

I tested the same backup on our CentOS 5.4 virtual machine (running on xen
server) and the results are really weird (118 rows, comparing to 116 on win
xp and 123 expected):

Aggregate  (cost=104.00..104.01 rows=1 width=0) (actual
time=120.373..120.374 rows=1 loops=1)
  -  Bitmap Heap Scan on search_tab  (cost=5.35..103.93 rows=25 width=0)
(actual time=59.418..120.137 rows=118 loops=1)
Recheck Cond: ((to_tsvector('german'::regconfig, keywords) @@
'''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
'''dd'':*'::tsquery))
-  Bitmap Index Scan on idx_keywords_ger  (cost=0.00..5.34 rows=25
width=0) (actual time=59.229..59.229 rows=495 loops=1)
  Index Cond: ((to_tsvector('german'::regconfig, keywords) @@
'''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
'''dd'':*'::tsquery))
Total runtime: 120.670 ms

And here are the configuration details:

PostgreSQL:
postgresql84-server-8.4.4-1.el5_5.1

# uname -r
2.6.18-164.15.1.el5xen

# cat /etc/redhat-release
CentOS release 5.4 (Final)

# cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Xeon(R) CPU5140  @ 2.33GHz
stepping: 6
cpu MHz : 2333.416
cache size  : 4096 KB
physical id : 0
siblings: 1
core id : 0
cpu cores   : 1
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu de tsc msr pae cx8 apic sep cmov pat clflush acpi mmx
fxsr sse sse2 ss ht syscall lm constant_tsc pni cx16 lahf_lm
bogomips: 5835.83
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:







Oleg Bartunov wrote:
 
 Artur,
 
 I don't know, but could you try linux machine ?
 
 Oleg
 

-- 
View this message in context: 
http://old.nabble.com/Incorrect-FTS-results-with-GIN-index-tp29172750p29212116.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incorrect FTS result with GIN index

2010-07-20 Thread Artur Dabrowski

The CentOS used for testing is a 64-bits version.


Artur Dabrowski wrote:
 
 I tested the same backup on our CentOS 5.4 virtual machine (running on xen
 server) and the results are really weird (118 rows, comparing to 116 on
 win xp and 123 expected):
 
 
 

-- 
View this message in context: 
http://old.nabble.com/Incorrect-FTS-results-with-GIN-index-tp29172750p29212162.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incorrect FTS result with GIN index

2010-07-20 Thread Oleg Bartunov

Artur,

I recommend post your problem to -hackers mailing list. I have no idea,
what could be a problem.

My machine is:
uname -a
Linux mira 2.6.33-020633-generic #020633 SMP Thu Feb 25 10:10:03 UTC 2010 
x86_64 GNU/Linux

PostgreSQL 8.4.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 
4.4.1-4ubuntu9) 4.4.1, 64-bit

As a last resort I recommend you to compile pg yourself and see if the 
problem exists.


Oleg


On Tue, 20 Jul 2010, Artur Dabrowski wrote:



I tested the same backup on our CentOS 5.4 virtual machine (running on xen
server) and the results are really weird (118 rows, comparing to 116 on win
xp and 123 expected):

Aggregate  (cost=104.00..104.01 rows=1 width=0) (actual
time=120.373..120.374 rows=1 loops=1)
 -  Bitmap Heap Scan on search_tab  (cost=5.35..103.93 rows=25 width=0)
(actual time=59.418..120.137 rows=118 loops=1)
   Recheck Cond: ((to_tsvector('german'::regconfig, keywords) @@
'''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
'''dd'':*'::tsquery))
   -  Bitmap Index Scan on idx_keywords_ger  (cost=0.00..5.34 rows=25
width=0) (actual time=59.229..59.229 rows=495 loops=1)
 Index Cond: ((to_tsvector('german'::regconfig, keywords) @@
'''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
'''dd'':*'::tsquery))
Total runtime: 120.670 ms

And here are the configuration details:

PostgreSQL:
postgresql84-server-8.4.4-1.el5_5.1

# uname -r
2.6.18-164.15.1.el5xen

# cat /etc/redhat-release
CentOS release 5.4 (Final)

# cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Xeon(R) CPU5140  @ 2.33GHz
stepping: 6
cpu MHz : 2333.416
cache size  : 4096 KB
physical id : 0
siblings: 1
core id : 0
cpu cores   : 1
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu de tsc msr pae cx8 apic sep cmov pat clflush acpi mmx
fxsr sse sse2 ss ht syscall lm constant_tsc pni cx16 lahf_lm
bogomips: 5835.83
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:







Oleg Bartunov wrote:


Artur,

I don't know, but could you try linux machine ?

Oleg






Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incorrect FTS result with GIN index

2010-07-20 Thread Artur Dabrowski

Oleg,

thanks for your help.

I sent a post to pg-hackers list:
http://old.nabble.com/Query-results-differ-depending-on-operating-system-%28using-GIN%29-ts29213082.html

As to compiling pg... I will no do this since I do not really feel
comfortable doing it and cannot dedicate too much time to this problem.

Artur



Oleg Bartunov wrote:
 
 Artur,
 
 I recommend post your problem to -hackers mailing list. I have no idea,
 what could be a problem.
 
 My machine is:
 uname -a
 Linux mira 2.6.33-020633-generic #020633 SMP Thu Feb 25 10:10:03 UTC 2010
 x86_64 GNU/Linux
 
 PostgreSQL 8.4.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu
 4.4.1-4ubuntu9) 4.4.1, 64-bit
 
 As a last resort I recommend you to compile pg yourself and see if the 
 problem exists.
 
 Oleg
 
 
 On Tue, 20 Jul 2010, Artur Dabrowski wrote:
 

 I tested the same backup on our CentOS 5.4 virtual machine (running on
 xen
 server) and the results are really weird (118 rows, comparing to 116 on
 win
 xp and 123 expected):

 Aggregate  (cost=104.00..104.01 rows=1 width=0) (actual
 time=120.373..120.374 rows=1 loops=1)
  -  Bitmap Heap Scan on search_tab  (cost=5.35..103.93 rows=25 width=0)
 (actual time=59.418..120.137 rows=118 loops=1)
Recheck Cond: ((to_tsvector('german'::regconfig, keywords) @@
 '''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
 '''dd'':*'::tsquery))
-  Bitmap Index Scan on idx_keywords_ger  (cost=0.00..5.34
 rows=25
 width=0) (actual time=59.229..59.229 rows=495 loops=1)
  Index Cond: ((to_tsvector('german'::regconfig, keywords) @@
 '''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
 '''dd'':*'::tsquery))
 Total runtime: 120.670 ms

 And here are the configuration details:

 PostgreSQL:
 postgresql84-server-8.4.4-1.el5_5.1

 # uname -r
 2.6.18-164.15.1.el5xen

 # cat /etc/redhat-release
 CentOS release 5.4 (Final)

 # cat /proc/cpuinfo
 processor   : 0
 vendor_id   : GenuineIntel
 cpu family  : 6
 model   : 15
 model name  : Intel(R) Xeon(R) CPU5140  @ 2.33GHz
 stepping: 6
 cpu MHz : 2333.416
 cache size  : 4096 KB
 physical id : 0
 siblings: 1
 core id : 0
 cpu cores   : 1
 fpu : yes
 fpu_exception   : yes
 cpuid level : 10
 wp  : yes
 flags   : fpu de tsc msr pae cx8 apic sep cmov pat clflush acpi
 mmx
 fxsr sse sse2 ss ht syscall lm constant_tsc pni cx16 lahf_lm
 bogomips: 5835.83
 clflush size: 64
 cache_alignment : 64
 address sizes   : 36 bits physical, 48 bits virtual
 power management:







 Oleg Bartunov wrote:

 Artur,

 I don't know, but could you try linux machine ?

 Oleg



 
   Regards,
   Oleg
 _
 Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
 Sternberg Astronomical Institute, Moscow University, Russia
 Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
 phone: +007(495)939-16-83, +007(495)939-23-83
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general
 
 

-- 
View this message in context: 
http://old.nabble.com/Incorrect-FTS-results-with-GIN-index-tp29172750p29215929.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incorrect FTS result with GIN index

2010-07-19 Thread Artur Dabrowski

Hello Oleg,

my results are different. The analysis looks like this (please note the
different numbers of rows):

Aggregate  (cost=104.05..104.06 rows=1 width=0) (actual
time=152.133..152.135 rows=1 loops=1)
  -  Bitmap Heap Scan on search_tab  (cost=5.39..103.98 rows=25 width=0)
(actual time=76.546..151.834 rows=116 loops=1)
Recheck Cond: ((to_tsvector('german'::regconfig, keywords) @@
'''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
'''dd'':*'::tsquery))
-  Bitmap Index Scan on idx_keywords_ger  (cost=0.00..5.38 rows=25
width=0) (actual time=76.292..76.292 rows=506 loops=1)
  Index Cond: ((to_tsvector('german'::regconfig, keywords) @@
'''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
'''dd'':*'::tsquery))
Total runtime: 152.389 ms


I have no idea, what could be the reason for different behaviour on your and
my machine (windows xp, postgreSQL 8.4.3)?  
I reproduced the same wrong behaviour on a machine of my co-worker (windows
xp, postgreSQL 8.4.4).


-- 
View this message in context: 
http://old.nabble.com/Incorrect-FTS-results-with-GIN-index-tp29172750p29203020.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incorrect FTS result with GIN index

2010-07-19 Thread Oleg Bartunov

Artur,

I don't know, but could you try linux machine ?

Oleg
On Mon, 19 Jul 2010, Artur Dabrowski wrote:



Hello Oleg,

my results are different. The analysis looks like this (please note the
different numbers of rows):

Aggregate  (cost=104.05..104.06 rows=1 width=0) (actual
time=152.133..152.135 rows=1 loops=1)
 -  Bitmap Heap Scan on search_tab  (cost=5.39..103.98 rows=25 width=0)
(actual time=76.546..151.834 rows=116 loops=1)
   Recheck Cond: ((to_tsvector('german'::regconfig, keywords) @@
'''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
'''dd'':*'::tsquery))
   -  Bitmap Index Scan on idx_keywords_ger  (cost=0.00..5.38 rows=25
width=0) (actual time=76.292..76.292 rows=506 loops=1)
 Index Cond: ((to_tsvector('german'::regconfig, keywords) @@
'''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
'''dd'':*'::tsquery))
Total runtime: 152.389 ms


I have no idea, what could be the reason for different behaviour on your and
my machine (windows xp, postgreSQL 8.4.3)?
I reproduced the same wrong behaviour on a machine of my co-worker (windows
xp, postgreSQL 8.4.4).





Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incorrect FTS result with GIN index

2010-07-17 Thread Oleg Bartunov

Artur,

I downloaded your dump and tried your queries with index, I see no problem
so far.

  Table public.search_tab
 Column |  Type   |Modifiers 
+-+--

 id | integer | not null default 
nextval('search_tab_id_seq1'::regclass)
 keywords   | text|
 collection_urn | text|
 bbox   | text|
 object_urn | text| not null
 description| text|
 category   | text|
 summary| text|
 priority   | integer | 
Indexes:

search_tab_pkey1 PRIMARY KEY, btree (id)
idx_keywords_ger gin (to_tsvector('german'::regconfig, keywords))

test=# explain analyze select count(*) from search_tab 
where (to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*')) 
and   (to_tsvector('german', keywords ) @@ to_tsquery('german', 'dd:*'));
   QUERY PLAN 
-

 Aggregate  (cost=103.87..103.88 rows=1 width=0) (actual time=24.784..24.784 
rows=1 loops=1)
   -  Bitmap Heap Scan on search_tab  (cost=5.21..103.80 rows=25 width=0) 
(actual time=24.642..24.769 rows=123 loops=1)
 Recheck Cond: ((to_tsvector('german'::regconfig, keywords) @@ 
'''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@ 
'''dd'':*'::tsquery))
 -  Bitmap Index Scan on idx_keywords_ger  (cost=0.00..5.21 rows=25 
width=0) (actual time=24.620..24.620 rows=123 loops=1)
   Index Cond: ((to_tsvector('german'::regconfig, keywords) @@ 
'''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@ 
'''dd'':*'::tsquery))
 Total runtime: 24.830 ms
(6 rows)

see rows=123


On Thu, 15 Jul 2010, Artur Dabrowski wrote:



Hello,

I was trying to use GIN index, but the results seem be incorrect.


1. QUERY WITHOUT INDEX
select count(*) from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*')) and
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'dd:*'));

count
---
  123
(1 row)


2. CREATING INDEX
create index idx_keywords_ger on search_tab
using gin(to_tsvector('german', keywords));


3. QUERY WITH INDEX
select count(*) from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*')) and
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'dd:*'));

count
---
  116
(1 row)


The number of rows is different. To make things more funny and ensure
problem is not caused by dictionary normalisation:

4. EQUIVALENT QUERY WITH INDEX
select count(*) from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*  dd:*'));

count
---
  123
(1 row)

I tried the same with simple-based dictionary. The problem is always
reproducible.

Total count of records in my database is 1 006 300 if it matters.

One of missing results is the following: lSWN eeIf hInEI IN
SIL3WugEOANcEGVWL1L LBAGAeLlGS ttfL DDhuDEIni9 ce. If the query is more
specifically targeted to find this row then it founds it:

5. MORE DETAILED QUERY WITH INDEX
select keywords from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'eeI:*  dd:*'));

keywords



lSWN eeIf hInEI IN SIL3WugEOANcEGVWL1L LBAGAeLlGS ttfL DDhuDEIni9 ce
tSALWIEEIn-3WNecGAINfLuLAV DDLIWNG E Lt h c8  BiIfgGl1 EeIhulSLenS6LDe5O
hGn DDlhIgGEAcS1O eeiEEI WnILWELS68VBLL AGNIAfINt6 lLuWuNeDc ItLfe SL
hGe WIiI EeItnLLuA1efOh3ALWc  uGINEltcIBE LnegLDNA3 DD SVNG LSSIlWfE
eeIW ItueS W39LnELg-GuDLEhAn8BeFG IVi DDNEfLG1SI 1tNIOA  lAhNLLccfWISE l
6em on.0nsRH nehSA2l1HAsauncu0I65l7 ddnsn1SAS i u0eLAnlr t70gaains w gzsH
eeiog
rfiwgso0g364l1 1wU eei1n 5lL dDA 0
DDInNcEfSWAEAtcL1IeSuAG5LE Lilh8tEGeDg f3B eEIOL7h uWV-L1IGN LINWeIn l S
ils eeiru00ewH.6sgAeHoSlLhglso0 asn0u2a atisA0 ddcngAnzRA Se Au2 nm8ns0
uS8snH
DDD EWlE1GShhLe8L NENI  tuL cgGGInfcBAlLfIO L1S eeIWeAEnILStu AViWNI
n IOLLt 0Alih tuWNE L nAGlVSNSDI DDeW BIegfG EeIhL9ELeScELWGAIfN1uIc
DnSE eeIWLu9tLNhNEuAt I1BelhGGfLWLS nSWINI eiELgAIG DDLEclV7 IO c Af
EeIElfN L4I lE2G cSOLniAWgSVItc ILDN L57BuDfALtSIe-WnGhGIW DDA NE1Lhuee
hNILN DD L6flSEeW1gthfI L1WAlENE eEIGIAt VGBDO uGLeLccAeSuLWIn Ii nS
(14 rows)


Did I misunderstood something or is it a bug?

Best regards
Artur



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list 

[GENERAL] Incorrect FTS result with GIN index

2010-07-15 Thread Artur Dabrowski

Hello, 

I was trying to use GIN index, but the results seem be incorrect.


1. QUERY WITHOUT INDEX
select count(*) from search_tab where 
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*')) and 
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'dd:*'));

 count
---
   123
(1 row)


2. CREATING INDEX
create index idx_keywords_ger on search_tab 
using gin(to_tsvector('german', keywords));


3. QUERY WITH INDEX
select count(*) from search_tab where 
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*')) and
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'dd:*'));

 count
---
   116
(1 row)


The number of rows is different. To make things more funny and ensure
problem is not caused by dictionary normalisation:

4. EQUIVALENT QUERY WITH INDEX
select count(*) from search_tab where 
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*  dd:*'));

 count
---
   123
(1 row)

I tried the same with simple-based dictionary. The problem is always
reproducible. 

Total count of records in my database is 1 006 300 if it matters.

One of missing results is the following: lSWN eeIf hInEI IN
SIL3WugEOANcEGVWL1L LBAGAeLlGS ttfL DDhuDEIni9 ce. If the query is more
specifically targeted to find this row then it founds it:

5. MORE DETAILED QUERY WITH INDEX
select keywords from search_tab where 
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'eeI:*  dd:*'));

 keywords



lSWN eeIf hInEI IN SIL3WugEOANcEGVWL1L LBAGAeLlGS ttfL DDhuDEIni9 ce
tSALWIEEIn-3WNecGAINfLuLAV DDLIWNG E Lt h c8  BiIfgGl1 EeIhulSLenS6LDe5O
hGn DDlhIgGEAcS1O eeiEEI WnILWELS68VBLL AGNIAfINt6 lLuWuNeDc ItLfe SL
hGe WIiI EeItnLLuA1efOh3ALWc  uGINEltcIBE LnegLDNA3 DD SVNG LSSIlWfE
eeIW ItueS W39LnELg-GuDLEhAn8BeFG IVi DDNEfLG1SI 1tNIOA  lAhNLLccfWISE l
6em on.0nsRH nehSA2l1HAsauncu0I65l7 ddnsn1SAS i u0eLAnlr t70gaains w gzsH
eeiog
rfiwgso0g364l1 1wU eei1n 5lL dDA 0
DDInNcEfSWAEAtcL1IeSuAG5LE Lilh8tEGeDg f3B eEIOL7h uWV-L1IGN LINWeIn l S
ils eeiru00ewH.6sgAeHoSlLhglso0 asn0u2a atisA0 ddcngAnzRA Se Au2 nm8ns0
uS8snH
DDD EWlE1GShhLe8L NENI  tuL cgGGInfcBAlLfIO L1S eeIWeAEnILStu AViWNI
n IOLLt 0Alih tuWNE L nAGlVSNSDI DDeW BIegfG EeIhL9ELeScELWGAIfN1uIc
DnSE eeIWLu9tLNhNEuAt I1BelhGGfLWLS nSWINI eiELgAIG DDLEclV7 IO c Af
EeIElfN L4I lE2G cSOLniAWgSVItc ILDN L57BuDfALtSIe-WnGhGIW DDA NE1Lhuee
hNILN DD L6flSEeW1gthfI L1WAlENE eEIGIAt VGBDO uGLeLccAeSuLWIn Ii nS
(14 rows)


Did I misunderstood something or is it a bug?

Best regards
Artur
-- 
View this message in context: 
http://old.nabble.com/Incorrect-FTS-result-with-GIN-index-tp29172750p29172750.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.