Re: [GENERAL] Need help with full text index configuration
Brian, you have two options: 1. Use your own parser (just modify default) 2. Use replace function, like postgres=# select to_tsvector( replace('qw/er/ty','/',' ')); to_tsvector -- 'er':2 'qw':1 'ty':3 (1 row) Oleg On Wed, 28 Jul 2010, Brian Hirt wrote: I have some data that can be searched, and it looks like the parser is making some assumptions about the data that aren't true in our case and I'm trying to figure out how to exclude a token type. I haven't been able to find the answer to my question so far, so I thought I would ask here. The data I have are english words, and sometimes there are words separated by a / without spaces. The parser finds these things and tokenizes them as files. I'm sure in some situations that's the right assumption, but based on my data, I know there will never be a file name in the column. For example instead of the parser recognizing three asciiword it recognizes one asciiword and one file. I'd like a way to have the / just get parsed as blank. db=# select * from ts_debug('english','maybe five/six'); alias |description| token | dictionaries | dictionary | lexemes ---+---+--++--+ asciiword | Word, all ASCII | maybe| {english_stem} | english_stem | {mayb} blank | Space symbols | | {} | | file | File or path name | five/six | {simple} | simple | {five/six} (3 rows) I thought that maybe I could create a new configuration and drop the file mapping, but that doesn't seem to work either. db=# CREATE TEXT SEARCH CONFIGURATION public.testd ( COPY = pg_catalog.english ); CREATE TEXT SEARCH CONFIGURATION db=# ALTER TEXT SEARCH CONFIGURATION testd DROP MAPPING FOR file; ALTER TEXT SEARCH CONFIGURATION db=# SELECT * FROM ts_debug('testd','mabye five/six'); alias |description| token | dictionaries | dictionary | lexemes ---+---+--++--+- asciiword | Word, all ASCII | mabye| {english_stem} | english_stem | {maby} blank | Space symbols | | {} | | file | File or path name | five/six | {} | | (3 rows) Is there anyway to do this? Thanks for the help in advance. I'm running 8.4.4 Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Need help with full text index configuration
I have some data that can be searched, and it looks like the parser is making some assumptions about the data that aren't true in our case and I'm trying to figure out how to exclude a token type. I haven't been able to find the answer to my question so far, so I thought I would ask here. The data I have are english words, and sometimes there are words separated by a / without spaces. The parser finds these things and tokenizes them as files. I'm sure in some situations that's the right assumption, but based on my data, I know there will never be a file name in the column. For example instead of the parser recognizing three asciiword it recognizes one asciiword and one file. I'd like a way to have the / just get parsed as blank. db=# select * from ts_debug('english','maybe five/six'); alias |description| token | dictionaries | dictionary | lexemes ---+---+--++--+ asciiword | Word, all ASCII | maybe| {english_stem} | english_stem | {mayb} blank | Space symbols | | {} | | file | File or path name | five/six | {simple} | simple | {five/six} (3 rows) I thought that maybe I could create a new configuration and drop the file mapping, but that doesn't seem to work either. db=# CREATE TEXT SEARCH CONFIGURATION public.testd ( COPY = pg_catalog.english ); CREATE TEXT SEARCH CONFIGURATION db=# ALTER TEXT SEARCH CONFIGURATION testd DROP MAPPING FOR file; ALTER TEXT SEARCH CONFIGURATION db=# SELECT * FROM ts_debug('testd','mabye five/six'); alias |description| token | dictionaries | dictionary | lexemes ---+---+--++--+- asciiword | Word, all ASCII | mabye| {english_stem} | english_stem | {maby} blank | Space symbols | | {} | | file | File or path name | five/six | {} | | (3 rows) Is there anyway to do this? Thanks for the help in advance. I'm running 8.4.4 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Need help with full text index configuration
Brian Hirt bh...@mobygames.com writes: For example instead of the parser recognizing three asciiword it recognizes one asciiword and one file. I'd like a way to have the / just get parsed as blank. AFAIK the only good way to do that is to write your own parser :-(. The builtin parser isn't really configurable. (If you didn't mind maintaining a private version you could patch its state transition table manually, but that seems like a PITA.) For the case at hand it could be a pretty thin frontend to the builtin text parser --- just change / to space and then call the builtin one. contrib/test_parser/ might help you get started. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Need help with full text index configuration
Tom, Thanks for the quick reply. Doing a frontend mapping was my next option since I really don't care about / and the ability to search on it. Preventing the parser from using the file tokenizer seemed like a better solution so I wanted to go down that path first (there are other false hits i was worried about too, like email, etc) I'm really confused about what ALTER TEXT SEARCH CONFIGURATION dict DROP MAPPING FOR file actually does. The documentation seems to make it sound like it does what I want, but I guess it does something else. --brian On Jul 28, 2010, at 2:06 PM, Tom Lane wrote: Brian Hirt bh...@mobygames.com writes: For example instead of the parser recognizing three asciiword it recognizes one asciiword and one file. I'd like a way to have the / just get parsed as blank. AFAIK the only good way to do that is to write your own parser :-(. The builtin parser isn't really configurable. (If you didn't mind maintaining a private version you could patch its state transition table manually, but that seems like a PITA.) For the case at hand it could be a pretty thin frontend to the builtin text parser --- just change / to space and then call the builtin one. contrib/test_parser/ might help you get started. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Need help with full text index configuration
Brian Hirt bh...@mobygames.com writes: I'm really confused about what ALTER TEXT SEARCH CONFIGURATION dict DROP MAPPING FOR file actually does. The documentation seems to make it sound like it does what I want, but I guess it does something else. No, it doesn't affect the parser's behavior at all. So foo/bar will still be parsed as a file token. What the above results in is dropping file tokens on the floor afterwards, instead of passing them to some dictionary. In general the mapping stuff just controls what dictionary(s) tokens produced by the parser will be routed to. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general