okayndc, A field configured to use HTMLStripCharFilter as part of its index-time analyzer will strip out HTML tags before index terms are created by the tokenizer, so HTML tags will not be put into the index. As a result, queries for HTML tags cannot match the original documents' HTML tags (in the field configured to use HTMLStripCharFilter, anyway).
So HTMLStripCharFilter should do what you want. Steve From: okayndc [mailto:[email protected]] Sent: Thursday, April 05, 2012 3:36 PM To: Steven A Rowe Cc: [email protected] Subject: Re: HTML tags and Lucene highlighting Hello, I want to ignore HTML tags within a search. ~ I should not be able to search for a HTML tag (ex. <strong>) and get back the highlighted HTML tag (ex. <span class="highlighted"><strong></span>) in a result set. Thanks On Thu, Apr 5, 2012 at 3:24 PM, Steven A Rowe <[email protected]<mailto:[email protected]>> wrote: Hi okayndc, What *do* you want? Steve -----Original Message----- From: okayndc [mailto:[email protected]<mailto:[email protected]>] Sent: Thursday, April 05, 2012 1:34 PM To: [email protected]<mailto:[email protected]> Subject: HTML tags and Lucene highlighting Hello, I currently use Lucene version 3.0...probably need to upgrade to a more current version soon. The problem that I have is when I test search for a an HTML tag (ex. <strong>), Lucene returns the highlighted HTML tag ~ which is what I DO NOT want. Is there a way to "filter" HTML tags? I have read up on HTMLStripChar filter (packaged with Solr) and wondered if this is the way to go? Any help will be greatly appreciated, Thanks --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected]<mailto:[email protected]> For additional commands, e-mail: [email protected]<mailto:[email protected]>
