Re: [GENERAL] FTS GIN Index Question

2010-09-29 Thread Oleg Bartunov

Christian,

On Wed, 29 Sep 2010, Christian Ramseyer wrote:


Hi List

I have a largish partitioned table, it has ~60 million records in each of 12 
partitions. It appears that a Full Text Index could speed up some user 
queries a lot.


A quick test with an additional tsvector column revealed that this would take 
up around 35 GB of space for this column and then maybe 5 more for the gin 
index on it. As this is a lot of space (~ 480 GB), I'm a bit tempted to use a 
gin index without the separate tsvector column. However, the doc says that 
this will be slower.


do you have problem with disk space ? Searching index is usually very fast
operation, only small part of index readed.  Did you checked time to read
index ?



Does anyone have an idea of how much slower we're talking here? The index 
defintion would be a concatenation of two setweights(), i.e.:


... using gin(
 (setweight(to_tsvector('config',coalesce(col1,'')), 'A') ||
  setweight(to_tsvector('config',coalesce(col2,'')), 'B')))

Also, general recommendations regarding full text search configurations of 
that size are very welcome.


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] FTS GIN Index Question

2010-09-29 Thread Christian Ramseyer

On 09/29/2010 12:10 PM, Oleg Bartunov wrote:

Christian,

On Wed, 29 Sep 2010, Christian Ramseyer wrote:


Hi List

I have a largish partitioned table, it has ~60 million records in each
of 12 partitions. It appears that a Full Text Index could speed up
some user queries a lot.

A quick test with an additional tsvector column revealed that this
would take up around 35 GB of space for this column and then maybe 5
more for the gin index on it. As this is a lot of space (~ 480 GB),
I'm a bit tempted to use a gin index without the separate tsvector
column. However, the doc says that this will be slower.


do you have problem with disk space ? Searching index is usually very fast
operation, only small part of index readed. Did you checked time to read
index ?



Hi Oleg, thanks for your reply,

Well I could get the disk space, but it's on a corporate SAN so it's a 
bit tedious. I didn't compare directly so far since even creating both 
index versions for only a single partition would bring me close to the 
space limit and also take maybe 20 hours or so. So my idea was to ask 
here first if I'm looking at a a few percent or a orders of 
magnitude difference. But of course, if there isn't enough feeback for 
an informed decision I'll either try both or just go with the 
space-intensive variant to be on the safe side, user experience is 
certainly more important than disk usage.


Christian




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] FTS GIN Index Question

2010-09-28 Thread Christian Ramseyer

Hi List

I have a largish partitioned table, it has ~60 million records in each 
of 12 partitions. It appears that a Full Text Index could speed up some 
user queries a lot.


A quick test with an additional tsvector column revealed that this would 
take up around 35 GB of space for this column and then maybe 5 more for 
the gin index on it. As this is a lot of space (~ 480 GB), I'm a bit 
tempted to use a gin index without the separate tsvector column. 
However, the doc says that this will be slower.


Does anyone have an idea of how much slower we're talking here? The 
index defintion would be a concatenation of two setweights(), i.e.:


... using gin(
  (setweight(to_tsvector('config',coalesce(col1,'')), 'A') ||
   setweight(to_tsvector('config',coalesce(col2,'')), 'B')))

Also, general recommendations regarding full text search configurations 
of that size are very welcome.


Christian




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general