Re: [GENERAL] Need help with full text index configuration

2010-07-29 Thread Oleg Bartunov

Brian,

you have two options:
1. Use your own parser (just modify default)
2. Use replace function, like
postgres=# select to_tsvector( replace('qw/er/ty','/',' '));
 to_tsvector 
--

 'er':2 'qw':1 'ty':3
(1 row)


Oleg


On Wed, 28 Jul 2010, Brian Hirt wrote:


I have some data that can be searched, and it looks like the parser is making 
some assumptions about the data that aren't true in our case and I'm trying to 
figure out how to exclude a token type.   I haven't been able to find the 
answer to my question so far, so I thought I would ask here.

The data I have are english words, and sometimes there are words separated by a 
/ without spaces.   The parser finds these things and tokenizes them as files.  
 I'm sure in some situations that's the right assumption, but based on my data, 
I know there will never be a file name in the column.

For example instead of the parser recognizing three asciiword it recognizes one 
asciiword and one file.   I'd like a way to have the / just get parsed as blank.

db=# select * from ts_debug('english','maybe five/six');
  alias   |description|  token   |  dictionaries  |  dictionary  |  
lexemes
---+---+--++--+
asciiword | Word, all ASCII   | maybe| {english_stem} | english_stem | 
{mayb}
blank | Space symbols |  | {} |  |
file  | File or path name | five/six | {simple}   | simple   | 
{five/six}
(3 rows)

I thought that maybe I could create a new configuration and drop the file 
mapping, but that doesn't seem to work either.

db=# CREATE TEXT SEARCH CONFIGURATION public.testd ( COPY = pg_catalog.english 
);
CREATE TEXT SEARCH CONFIGURATION
db=# ALTER TEXT SEARCH CONFIGURATION testd DROP MAPPING FOR file;
ALTER TEXT SEARCH CONFIGURATION
db=# SELECT * FROM ts_debug('testd','mabye five/six');
  alias   |description|  token   |  dictionaries  |  dictionary  | 
lexemes
---+---+--++--+-
asciiword | Word, all ASCII   | mabye| {english_stem} | english_stem | 
{maby}
blank | Space symbols |  | {} |  |
file  | File or path name | five/six | {} |  |
(3 rows)


Is there anyway to do this?

Thanks for the help in advance.  I'm running 8.4.4



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Need help with full text index configuration

2010-07-28 Thread Brian Hirt
I have some data that can be searched, and it looks like the parser is making 
some assumptions about the data that aren't true in our case and I'm trying to 
figure out how to exclude a token type.   I haven't been able to find the 
answer to my question so far, so I thought I would ask here.

The data I have are english words, and sometimes there are words separated by a 
/ without spaces.   The parser finds these things and tokenizes them as files.  
 I'm sure in some situations that's the right assumption, but based on my data, 
I know there will never be a file name in the column.   

For example instead of the parser recognizing three asciiword it recognizes one 
asciiword and one file.   I'd like a way to have the / just get parsed as 
blank. 

db=# select * from ts_debug('english','maybe five/six');
   alias   |description|  token   |  dictionaries  |  dictionary  |  
lexemes   
---+---+--++--+
 asciiword | Word, all ASCII   | maybe| {english_stem} | english_stem | 
{mayb}
 blank | Space symbols |  | {} |  | 
 file  | File or path name | five/six | {simple}   | simple   | 
{five/six}
(3 rows)

I thought that maybe I could create a new configuration and drop the file 
mapping, but that doesn't seem to work either.

db=# CREATE TEXT SEARCH CONFIGURATION public.testd ( COPY = pg_catalog.english 
);
CREATE TEXT SEARCH CONFIGURATION
db=# ALTER TEXT SEARCH CONFIGURATION testd DROP MAPPING FOR file;
ALTER TEXT SEARCH CONFIGURATION
db=# SELECT * FROM ts_debug('testd','mabye five/six');
   alias   |description|  token   |  dictionaries  |  dictionary  | 
lexemes 
---+---+--++--+-
 asciiword | Word, all ASCII   | mabye| {english_stem} | english_stem | 
{maby}
 blank | Space symbols |  | {} |  | 
 file  | File or path name | five/six | {} |  | 
(3 rows)


Is there anyway to do this?

Thanks for the help in advance.  I'm running 8.4.4
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need help with full text index configuration

2010-07-28 Thread Tom Lane
Brian Hirt bh...@mobygames.com writes:
 For example instead of the parser recognizing three asciiword it recognizes 
 one asciiword and one file.   I'd like a way to have the / just get parsed as 
 blank. 

AFAIK the only good way to do that is to write your own parser :-(.
The builtin parser isn't really configurable.  (If you didn't mind
maintaining a private version you could patch its state transition
table manually, but that seems like a PITA.)

For the case at hand it could be a pretty thin frontend to the builtin
text parser --- just change / to space and then call the builtin one.
contrib/test_parser/ might help you get started.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need help with full text index configuration

2010-07-28 Thread Brian Hirt
Tom,

Thanks for the quick reply.   Doing a frontend mapping was my next option since 
I really don't care about / and the ability to search on it.   Preventing the 
parser from using the file tokenizer seemed like a better solution so I wanted 
to go down that path first (there are other false hits i was worried about too, 
like email, etc)

I'm really confused about what ALTER TEXT SEARCH CONFIGURATION dict DROP 
MAPPING FOR file actually does.   The documentation seems to make it sound 
like it does what I want, but I guess it does something else.

--brian

On Jul 28, 2010, at 2:06 PM, Tom Lane wrote:

 Brian Hirt bh...@mobygames.com writes:
 For example instead of the parser recognizing three asciiword it recognizes 
 one asciiword and one file.   I'd like a way to have the / just get parsed 
 as blank. 
 
 AFAIK the only good way to do that is to write your own parser :-(.
 The builtin parser isn't really configurable.  (If you didn't mind
 maintaining a private version you could patch its state transition
 table manually, but that seems like a PITA.)
 
 For the case at hand it could be a pretty thin frontend to the builtin
 text parser --- just change / to space and then call the builtin one.
 contrib/test_parser/ might help you get started.
 
   regards, tom lane
 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need help with full text index configuration

2010-07-28 Thread Tom Lane
Brian Hirt bh...@mobygames.com writes:
 I'm really confused about what ALTER TEXT SEARCH CONFIGURATION dict DROP 
 MAPPING FOR file actually does.   The documentation seems to make it sound 
 like it does what I want, but I guess it does something else.

No, it doesn't affect the parser's behavior at all.  So foo/bar will
still be parsed as a file token.  What the above results in is
dropping file tokens on the floor afterwards, instead of passing them
to some dictionary.  In general the mapping stuff just controls what
dictionary(s) tokens produced by the parser will be routed to.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general