Re: [GENERAL] Shrinking TSvectors

2016-04-05 Thread Howard News



On 05/04/2016 15:15, Artur Zakirov wrote:

On 05.04.2016 14:37, Howard News wrote:

Hi,

does anyone have any pointers for shrinking tsvectors

I have looked at the contents of some of these fields and they contain
many details that are not needed. For example...

"'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 '-873':944
'-9972':945 '/partners/application.html':222
'/partners/program/program-agreement.pdf':271
'/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087
'1':753,771 '12':366 '14':66 (...)"

I am not interested in keeping the numbers or urls in the indexes.

Thanks,

Howard.




Hello,

You need create a new text search configuration. Here is an example of 
commands:


CREATE TEXT SEARCH CONFIGURATION public.english_cfg (
PARSER = default
);
ALTER TEXT SEARCH CONFIGURATION public.english_cfg
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH pg_catalog.english_stem;

Instead of the "pg_catalog.english_stem" you can use your own dictionary.

Lets compare new configuration with the embedded configuration 
"pg_catalog.english":


postgres=# select to_tsvector('english_cfg', 'home -9972 
/partners/application.html /partners/program/program-agreement.pdf');

 to_tsvector
-
 'home':1
(1 row)

postgres=# select to_tsvector('english', 'home -9972 
/partners/application.html /partners/program/program-agreement.pdf');

  to_tsvector
--- 

 '-9972':2 '/partners/application.html':3 
'/partners/program/program-agreement.pdf':4 'home':1

(1 row)


You can get some additional information about configurations using \dF+:

postgres=# \dF+ english
Text search configuration "pg_catalog.english"
Parser: "pg_catalog.default"
  Token  | Dictionaries
-+--
 asciihword  | english_stem
 asciiword   | english_stem
 email   | simple
 file| simple
 float   | simple
 host| simple
 hword   | english_stem
 hword_asciipart | english_stem
 hword_numpart   | simple
 hword_part  | english_stem
 int | simple
 numhword| simple
 numword | simple
 sfloat  | simple
 uint| simple
 url | simple
 url_path| simple
 version | simple
 word| english_stem

postgres=# \dF+ english_cfg
Text search configuration "public.english_cfg"
Parser: "pg_catalog.default"
  Token  | Dictionaries
-+--
 asciihword  | english_stem
 asciiword   | english_stem
 hword   | english_stem
 hword_asciipart | english_stem
 hword_part  | english_stem
 word| english_stem


Thanks Artur,

Thats amazing! Postgres never ceases to amaze me. And the same goes for 
the contributors to this list.






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Shrinking TSvectors

2016-04-05 Thread Adrian Klaver

On 04/05/2016 07:37 AM, Howard News wrote:



On 05/04/2016 14:44, Oleg Bartunov wrote:



On Tue, Apr 5, 2016 at 2:37 PM, Howard News > wrote:

Hi,

does anyone have any pointers for shrinking tsvectors

I have looked at the contents of some of these fields and they
contain many details that are not needed. For example...

"'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937
'-873':944 '-9972':945 '/partners/application.html':222
'/partners/program/program-agreement.pdf':271
'/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087
'1':753,771 '12':366 '14':66 (...)"

I am not interested in keeping the numbers or urls in the indexes.



select strip ('asd:23');
 strip
---
 'asd'
(1 row)



Hi Oleg,

Is this function documented anywhere?


http://www.postgresql.org/docs/9.5/static/functions-textsearch.html



Howard.



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Shrinking TSvectors

2016-04-05 Thread Alexander Shereshevsky
On Tue, Apr 5, 2016 at 5:37 PM, Howard News 
wrote:

>
>
> On 05/04/2016 14:44, Oleg Bartunov wrote:
>
>
>
> On Tue, Apr 5, 2016 at 2:37 PM, Howard News 
> wrote:
>
>> Hi,
>>
>> does anyone have any pointers for shrinking tsvectors
>>
>> I have looked at the contents of some of these fields and they contain
>> many details that are not needed. For example...
>>
>> "'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 '-873':944
>> '-9972':945 '/partners/application.html':222
>> '/partners/program/program-agreement.pdf':271
>> '/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087
>> '1':753,771 '12':366 '14':66 (...)"
>>
>
> I am not interested in keeping the numbers or urls in the indexes.
>>
>
>
> select strip ('asd:23');
>  strip
> ---
>  'asd'
> (1 row)
>
>
>
> Hi Oleg,
>
> Is this function documented anywhere?
>
> Howard.
>

​
http://www.postgresql.org/docs/9.4/static/textsearch-features.html#TEXTSEARCH-MANIPULATE-TSVECTOR
​


Re: [GENERAL] Shrinking TSvectors

2016-04-05 Thread Howard News



On 05/04/2016 14:44, Oleg Bartunov wrote:



On Tue, Apr 5, 2016 at 2:37 PM, Howard News > wrote:


Hi,

does anyone have any pointers for shrinking tsvectors

I have looked at the contents of some of these fields and they
contain many details that are not needed. For example...

"'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937
'-873':944 '-9972':945 '/partners/application.html':222
'/partners/program/program-agreement.pdf':271
'/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087
'1':753,771 '12':366 '14':66 (...)"

I am not interested in keeping the numbers or urls in the indexes.



select strip ('asd:23');
 strip
---
 'asd'
(1 row)



Hi Oleg,

Is this function documented anywhere?

Howard.


Re: [GENERAL] Shrinking TSvectors

2016-04-05 Thread Artur Zakirov

On 05.04.2016 14:37, Howard News wrote:

Hi,

does anyone have any pointers for shrinking tsvectors

I have looked at the contents of some of these fields and they contain
many details that are not needed. For example...

"'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 '-873':944
'-9972':945 '/partners/application.html':222
'/partners/program/program-agreement.pdf':271
'/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087
'1':753,771 '12':366 '14':66 (...)"

I am not interested in keeping the numbers or urls in the indexes.

Thanks,

Howard.




Hello,

You need create a new text search configuration. Here is an example of 
commands:


CREATE TEXT SEARCH CONFIGURATION public.english_cfg (
PARSER = default
);
ALTER TEXT SEARCH CONFIGURATION public.english_cfg
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH pg_catalog.english_stem;

Instead of the "pg_catalog.english_stem" you can use your own dictionary.

Lets compare new configuration with the embedded configuration 
"pg_catalog.english":


postgres=# select to_tsvector('english_cfg', 'home -9972 
/partners/application.html /partners/program/program-agreement.pdf');

 to_tsvector
-
 'home':1
(1 row)

postgres=# select to_tsvector('english', 'home -9972 
/partners/application.html /partners/program/program-agreement.pdf');
  to_tsvector 


---
 '-9972':2 '/partners/application.html':3 
'/partners/program/program-agreement.pdf':4 'home':1

(1 row)


You can get some additional information about configurations using \dF+:

postgres=# \dF+ english
Text search configuration "pg_catalog.english"
Parser: "pg_catalog.default"
  Token  | Dictionaries
-+--
 asciihword  | english_stem
 asciiword   | english_stem
 email   | simple
 file| simple
 float   | simple
 host| simple
 hword   | english_stem
 hword_asciipart | english_stem
 hword_numpart   | simple
 hword_part  | english_stem
 int | simple
 numhword| simple
 numword | simple
 sfloat  | simple
 uint| simple
 url | simple
 url_path| simple
 version | simple
 word| english_stem

postgres=# \dF+ english_cfg
Text search configuration "public.english_cfg"
Parser: "pg_catalog.default"
  Token  | Dictionaries
-+--
 asciihword  | english_stem
 asciiword   | english_stem
 hword   | english_stem
 hword_asciipart | english_stem
 hword_part  | english_stem
 word| english_stem

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Shrinking TSvectors

2016-04-05 Thread Oleg Bartunov
On Tue, Apr 5, 2016 at 2:37 PM, Howard News 
wrote:

> Hi,
>
> does anyone have any pointers for shrinking tsvectors
>
> I have looked at the contents of some of these fields and they contain
> many details that are not needed. For example...
>
> "'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 '-873':944
> '-9972':945 '/partners/application.html':222
> '/partners/program/program-agreement.pdf':271
> '/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087
> '1':753,771 '12':366 '14':66 (...)"
>

I am not interested in keeping the numbers or urls in the indexes.
>


select strip ('asd:23');
 strip
---
 'asd'
(1 row)



>
> Thanks,
>
> Howard.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


[GENERAL] Shrinking TSvectors

2016-04-05 Thread Howard News

Hi,

does anyone have any pointers for shrinking tsvectors

I have looked at the contents of some of these fields and they contain 
many details that are not needed. For example...


"'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 '-873':944 
'-9972':945 '/partners/application.html':222 
'/partners/program/program-agreement.pdf':271 
'/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087 
'1':753,771 '12':366 '14':66 (...)"


I am not interested in keeping the numbers or urls in the indexes.

Thanks,

Howard.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general