Hi Karolina,

yes (of course!) We have an XML element for the part numbers, but upto now they are not all tagged thus we need regex matching as well...

Am 28.01.2011 13:31, schrieb Karolina Bernat:
Hi Wulf,

can I ask, if it is structured documentation (like XML or SGML) you're
dealing with? It's because I also work with technical documentation and we
do exactly, waht you're asking for, but it is XML-data.


On Fri, Jan 28, 2011 at 1:05 PM, Wulf Berschin<bersc...@dosco.de>  wrote:

Hi,

I'm poking in the dark and hope someone has some light...

We have part numbers in technical documentation to retrieve. For now we
have a (long) regular expression to find those in a string. The part numbers
have letters, digits and (redundant) whitespace. Furthermore authors often
used a compressed notation for number ranges with dashes or slashes, like
A123-56 or A123/4.

When searching for part numbers users should be able to enter specific
numbers like A126 (then the text "A123-56" should be found too) or wildcard
searches like "A12?" or "A*". This part number seach is a separate feature
apart from regular full text search.

As far I see I have to

- add an extra field for storing part numbers

- create a Tokenizer which recognizes just the part numbers and skips all
other text

- create an Analyzer which expands ranges like A123-56 to A123, A124, ...,
A156 and normalizes numbers by remving whitespace

With this analyzer I hope to get the highlighting to work too (e.g.
"A123-56" highlighted when "A126" was the search term).

Is this the right way? What could I use as starting point (I found
org.apache.lucene.analysis.miscellaneous.PatternAnalyzer which does much
more than I need...)

Thanks for all hints!

Wulf


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





--

Mit freundlichen Grüßen,

Wulf Berschin

--

<!-- *****************************************************************
* Wulf Berschin                            Telefon: +49 6221 1486 16 *
* DOSCO Document Systems Consulting GmbH   Telefax: +49 6221 1486 19 *
* Mannheimer Strasse 1                     E-Mail: bersc...@dosco.de *
* 69115 Heidelberg, Germany                http://www.dosco.de       *
* Handelsregister: Heidelberg HRB 335122                             *
* Geschäftsführung: Robert Erfle                                     *
****************************************************************** -->


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to