Hi,
I'm poking in the dark and hope someone has some light...
We have part numbers in technical documentation to retrieve. For now we
have a (long) regular expression to find those in a string. The part
numbers have letters, digits and (redundant) whitespace. Furthermore
authors often used a compressed notation for number ranges with dashes
or slashes, like A123-56 or A123/4.
When searching for part numbers users should be able to enter specific
numbers like A126 (then the text "A123-56" should be found too) or
wildcard searches like "A12?" or "A*". This part number seach is a
separate feature apart from regular full text search.
As far I see I have to
- add an extra field for storing part numbers
- create a Tokenizer which recognizes just the part numbers and skips
all other text
- create an Analyzer which expands ranges like A123-56 to A123, A124,
..., A156 and normalizes numbers by remving whitespace
With this analyzer I hope to get the highlighting to work too (e.g.
"A123-56" highlighted when "A126" was the search term).
Is this the right way? What could I use as starting point (I found
org.apache.lucene.analysis.miscellaneous.PatternAnalyzer which does much
more than I need...)
Thanks for all hints!
Wulf
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org