Re: [GENERAL] Differences between to_tsquery() and plainto_tsquery() when it comes to applying an index?

2010-08-26 Thread Denis Papathanasiou

That index doesn't match the query, so it can't be used.  Try

select item_pk from node where 
tag='primaryIssuer.entityType' and

to_tsvector('english', val) @@ plainto_tsquery('Limited Partnership');


Tom and Oleg: thank you for clarifying this.

I see where I made the mistake in applying the example from the 
documentation.



Note that seeing that the scan on the other index is pretty cheap,
it's not obvious that indexing the @@ clause is better anyway.


So is it the case that, as in the documented example, I need to add a 
column of type ts_vector to the table for the index to be most effective?


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Differences between to_tsquery() and plainto_tsquery() when it comes to applying an index?

2010-08-26 Thread Tom Lane
Denis Papathanasiou denis.papathanas...@gmail.com writes:
 So is it the case that, as in the documented example, I need to add a 
 column of type ts_vector to the table for the index to be most effective?

You can do it either way, depending on what you want the queries to look
like.  One advantage of materializing the tsvector entries in the table
is that the to_tsvector() work doesn't have to be done over again if
the planner decides not to use that index for some reason (for instance,
it thinks the other index is more effective for the particular query).

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Differences between to_tsquery() and plainto_tsquery() when it comes to applying an index?

2010-08-25 Thread Denis Papathanasiou

As a follow-up to my question from this past Saturday
(http://archives.postgresql.org/pgsql-general/2010-08/msg00624.php), I
experimented with adding two types of indices to the same text column:

CREATE INDEX item_eng_searchable_text_idx ON item USING
gin(to_tsvector('english', searchable_text));

and

CREATE INDEX item_searchable_text_idx ON item (searchable_text);

Running my queries under explain, I noticed that queries of the form:

select pk from item where searchable_text @@ plainto_tsquery('search
phrase');

Actually run *slower* with the item_eng_searchable_text_idx index applied.

But when I switch the query to use to_tsquery() instead, e.g. something
like this:

select pk from item where searchable_text @@ to_tsquery('phrase');

The performance is better.

Is this because the gin/to_tsvector() index works differently for
to_tsquery() compared to plainto_ts_query() ?

If so, how can I create an effective index for queries that will use
plainto_tsquery() ?

Note that I need the plainto_tsquery() form b/c my search phrases will
correspond to exact blocks of text, and therefore they will contain
spaces and punctuation, etc.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Differences between to_tsquery() and plainto_tsquery() when it comes to applying an index?

2010-08-25 Thread Tom Lane
Denis Papathanasiou denis.papathanas...@gmail.com writes:
 Is this because the gin/to_tsvector() index works differently for
 to_tsquery() compared to plainto_ts_query() ?

Not per se, but maybe looking at the actual outputs of the two function
calls would be enlightening.  I suspect you'll find that the search
conditions you are getting are not equivalent.

 Note that I need the plainto_tsquery() form b/c my search phrases will
 correspond to exact blocks of text, and therefore they will contain
 spaces and punctuation, etc.

In that case you haven't understood how text search works at all.
It simply doesn't do that.  You could possibly follow up a text search
for component words with a LIKE or similar comparison to verify that
the matches actually contain the desired string.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Differences between to_tsquery() and plainto_tsquery() when it comes to applying an index?

2010-08-25 Thread Denis Papathanasiou



Not per se, but maybe looking at the actual outputs of the two function
calls would be enlightening.  I suspect you'll find that the search
conditions you are getting are not equivalent.


Strictly speaking they're not, b/c the plainto_tsquery() is chaining 
several tokens together.


However, at the heart of the question is this: if I define the index on 
that column like this:


CREATE INDEX item_eng_searchable_text_idx ON item USING 
gin(to_tsvector('english', searchable_text));


since, unlike the example in the 
http://www.postgresql.org/docs/8.4/interactive/textsearch-tables.html 
doc, I cannot define the searchable

text column in the table as being being of type ts_vector(), because the
text is not exclusively in English;

then, will the index be effective?

Based on some empirical experiments, it seems not.

If that's indeed that case, then how should I create the index?


In that case you haven't understood how text search works at all.
It simply doesn't do that.  You could possibly follow up a text search
for component words with a LIKE or similar comparison to verify that
the matches actually contain the desired string.


I cannot use LIKE on each token of the phrase because the text in the 
column is unordered, and I would have to do an ILIKE '%'+token+'%' on 
each to be logically correct.


IIRC, wildcards of the form token+'%' can use an index efficiently, but 
wildcards on both ends such as '%'+token+'%' do not.


I did think about splitting the phrase tokens and doing a tsquery() join 
on each token, but it seemed that's why plainto_tsquery() was created, 
unless I misunderstood the docs on that specific point.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Differences between to_tsquery() and plainto_tsquery() when it comes to applying an index?

2010-08-25 Thread Oleg Bartunov

Denis,

we need examples of your explain analyze. I don't want to waste my time
reading theoretical reasoning :)

btw, Be sure you use the same search configuration as in create index or
index will not be used at all.

Oleg

On Wed, 25 Aug 2010, Denis Papathanasiou wrote:


As a follow-up to my question from this past Saturday
(http://archives.postgresql.org/pgsql-general/2010-08/msg00624.php), I
experimented with adding two types of indices to the same text column:

CREATE INDEX item_eng_searchable_text_idx ON item USING
gin(to_tsvector('english', searchable_text));

and

CREATE INDEX item_searchable_text_idx ON item (searchable_text);

Running my queries under explain, I noticed that queries of the form:

select pk from item where searchable_text @@ plainto_tsquery('search
phrase');

Actually run *slower* with the item_eng_searchable_text_idx index applied.

But when I switch the query to use to_tsquery() instead, e.g. something
like this:

select pk from item where searchable_text @@ to_tsquery('phrase');

The performance is better.

Is this because the gin/to_tsvector() index works differently for
to_tsquery() compared to plainto_ts_query() ?

If so, how can I create an effective index for queries that will use
plainto_tsquery() ?

Note that I need the plainto_tsquery() form b/c my search phrases will
correspond to exact blocks of text, and therefore they will contain
spaces and punctuation, etc.





Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Differences between to_tsquery() and plainto_tsquery() when it comes to applying an index?

2010-08-25 Thread Denis Papathanasiou



we need examples of your explain analyze. I don't want to waste my time
reading theoretical reasoning :)


Here's an actual 'explain analyze' example:

alerts= CREATE INDEX node_val_tsv_idx ON node USING 
gin(to_tsvector('english', val));

CREATE INDEX
alerts= explain analyze select item_pk from node where 
tag='primaryIssuer.entityType' and val @@ plainto_tsquery('Limited 
Partnership');
 QUERY PLAN 


-
 Bitmap Heap Scan on node  (cost=204.26..5792.92 rows=4 width=16) 
(actual time=2.952..131.868 rows=953 loops=1)

   Recheck Cond: (tag = 'primaryIssuer.entityType'::text)
   Filter: (val @@ plainto_tsquery('Limited Partnership'::text))
   -  Bitmap Index Scan on node_tag_idx  (cost=0.00..204.26 rows=3712 
width=0) (actual time=1.628..1.628 rows=3631 loops=1)

 Index Cond: (tag = 'primaryIssuer.entityType'::text)
 Total runtime: 133.345 ms
(6 rows)

alerts= DROP INDEX node_val_tsv_idx;
DROP INDEX
alerts= explain analyze select item_pk from node where 
tag='primaryIssuer.entityType' and val @@ plainto_tsquery('Limited 
Partnership');
 QUERY PLAN 


-
 Bitmap Heap Scan on node  (cost=204.26..5792.92 rows=4 width=16) 
(actual time=2.938..93.239 rows=953 loops=1)

   Recheck Cond: (tag = 'primaryIssuer.entityType'::text)
   Filter: (val @@ plainto_tsquery('Limited Partnership'::text))
   -  Bitmap Index Scan on node_tag_idx  (cost=0.00..204.26 rows=3712 
width=0) (actual time=1.614..1.614 rows=3631 loops=1)

 Index Cond: (tag = 'primaryIssuer.entityType'::text)
 Total runtime: 94.696 ms
(6 rows)

The table this is run against is defined like this:

CREATE TABLE node (
pk uuid primary key,
item_pk uuid not null references item (pk),
tag text not null,
val text
);

In addition to the gin/ts_vector index on node.val shown above, there 
are two other explicit indices on this table:


CREATE INDEX node_tag_idx ON node (tag);
CREATE INDEX node_val_idx ON node (val);

The reason for the node_val_idx index is that there will be cases where 
the query phrase is known exactly, so the where clause in the select 
statement will be just val = 'Limited Partnership'.



btw, Be sure you use the same search configuration as in create index or
index will not be used at all.


Is this indeed the problem here?

The explain output references val @@ plainto_tsquery() but as a 
filter, whereas the tag portion of the statement mentions node_tag_idx 
as the index it used.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Differences between to_tsquery() and plainto_tsquery() when it comes to applying an index?

2010-08-25 Thread Oleg Bartunov

Try this select

alerts= explain analyze select item_pk from node where
 tag='primaryIssuer.entityType' and val @@ plainto_tsquery('english','Limited
 Partnership');

Read 12.2.2. Creating Indexes at 
http://www.postgresql.org/docs/8.4/static/textsearch-tables.html

CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector('english', body));

Notice that the 2-argument version of to_tsvector is used. Only text search 
functions that specify a configuration name can be used in expression indexes 
(Section 11.7). This is because the index contents must be unaffected by 
default_text_search_config. If they were affected, the index contents might be 
inconsistent because different entries could contain tsvectors that were 
created with different text search configurations, and there would be no way to 
guess which was which. It would be impossible to dump and restore such an index 
correctly.



Oleg

On Wed, 25 Aug 2010, Denis Papathanasiou wrote:




we need examples of your explain analyze. I don't want to waste my time
reading theoretical reasoning :)


Here's an actual 'explain analyze' example:

alerts= CREATE INDEX node_val_tsv_idx ON node USING 
gin(to_tsvector('english', val));

CREATE INDEX
alerts= explain analyze select item_pk from node where 
tag='primaryIssuer.entityType' and val @@ plainto_tsquery('Limited 
Partnership');
QUERY PLAN 
-
Bitmap Heap Scan on node  (cost=204.26..5792.92 rows=4 width=16) (actual 
time=2.952..131.868 rows=953 loops=1)

  Recheck Cond: (tag = 'primaryIssuer.entityType'::text)
  Filter: (val @@ plainto_tsquery('Limited Partnership'::text))
  -  Bitmap Index Scan on node_tag_idx  (cost=0.00..204.26 rows=3712 
width=0) (actual time=1.628..1.628 rows=3631 loops=1)

Index Cond: (tag = 'primaryIssuer.entityType'::text)
Total runtime: 133.345 ms
(6 rows)

alerts= DROP INDEX node_val_tsv_idx;
DROP INDEX
alerts= explain analyze select item_pk from node where 
tag='primaryIssuer.entityType' and val @@ plainto_tsquery('Limited 
Partnership');
QUERY PLAN 
-
Bitmap Heap Scan on node  (cost=204.26..5792.92 rows=4 width=16) (actual 
time=2.938..93.239 rows=953 loops=1)

  Recheck Cond: (tag = 'primaryIssuer.entityType'::text)
  Filter: (val @@ plainto_tsquery('Limited Partnership'::text))
  -  Bitmap Index Scan on node_tag_idx  (cost=0.00..204.26 rows=3712 
width=0) (actual time=1.614..1.614 rows=3631 loops=1)

Index Cond: (tag = 'primaryIssuer.entityType'::text)
Total runtime: 94.696 ms
(6 rows)

The table this is run against is defined like this:

CREATE TABLE node (
   pk uuid primary key,
   item_pk uuid not null references item (pk),
   tag text not null,
   val text
);

In addition to the gin/ts_vector index on node.val shown above, there are two 
other explicit indices on this table:


CREATE INDEX node_tag_idx ON node (tag);
CREATE INDEX node_val_idx ON node (val);

The reason for the node_val_idx index is that there will be cases where the 
query phrase is known exactly, so the where clause in the select statement 
will be just val = 'Limited Partnership'.



btw, Be sure you use the same search configuration as in create index or
index will not be used at all.


Is this indeed the problem here?

The explain output references val @@ plainto_tsquery() but as a filter, 
whereas the tag portion of the statement mentions node_tag_idx as the index 
it used.




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Differences between to_tsquery() and plainto_tsquery() when it comes to applying an index?

2010-08-25 Thread Tom Lane
Denis Papathanasiou denis.papathanas...@gmail.com writes:
 alerts= CREATE INDEX node_val_tsv_idx ON node USING 
 gin(to_tsvector('english', val));
 CREATE INDEX
 alerts= explain analyze select item_pk from node where 
 tag='primaryIssuer.entityType' and val @@ plainto_tsquery('Limited 
 Partnership');
 [ doesn't use the index ]

That index doesn't match the query, so it can't be used.  Try

select item_pk from node where 
tag='primaryIssuer.entityType' and
to_tsvector('english', val) @@ plainto_tsquery('Limited Partnership');

Note that seeing that the scan on the other index is pretty cheap,
it's not obvious that indexing the @@ clause is better anyway.
But it *can't* index this query using this index.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general