Ironically, I just had to solve this exact problem just 10 minutes ago...

Check into javax.swing.text.html.HTMLEditorKit and
javax.swing.text.html.HTMLDocument. Here's a URL that I found helpful (the site
is Japanese, but the source code is still Java):

http://java-house.jp/ml/archive/j-h-b/037727.html?#_body



"Lichty, Kent" wrote:

> We have a web application that builds pages "on the fly" by reading directly
> from a database. The database contains both normal content and HTML.  We use
> Lucene as our search engine, but I need to figure out how to cause it to NOT
> include content that is within HTML tags. I assume that this entails the
> creation of a custom Analyzer.  Are there any existing Analyzers already out
> there that work like this? Thanks!
>
> ----------  Internet E-mail Confidentiality Disclaimer  ----------
>
> PRIVILEGED / CONFIDENTIAL INFORMATION may be contained in this message.  If
> you are not the addressee indicated in this message or the employee or agent
> responsible for delivering it to the addressee, you are hereby on notice
> that you are in possession of confidential and privileged information.  Any
> dissemination, distribution, or copying of this e-mail is strictly
> prohibited.  In such case, you should destroy this message and kindly notify
> the sender by reply e-mail.  Please advise immediately if you or your
> employer do not consent to Internet email for messages of this kind.
>
> Opinions, conclusions, and other information in this message that do not
> relate to the official business of my firm shall be understood as neither
> given nor endorsed by it.
>
> --
> To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@;jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>

Reply via email to