ask on the user-list, actually search the archives first
On Feb 7, 2007, at 6:10 PM, Joe Tang wrote:
My work is to index keywords with a document. In my case, the
document is
made up with HTML tags which i don't want to index them.
For example:
Input Document:
<div id="tp-wrapper">
<span id="tp-top-right">You are welcome</span>
<div id="tp-tab">
<h1>Testing text</h1>
/images/gui/tab_grey_bkg_lftend.gif
</div>
</div>
Expected Keywords:
keywords:You
keywords:are
keywords:welcome
keywords:Testing
keywords:text
Is there anyway I can make them not to be one of the keywords?
--
View this message in context: http://www.nabble.com/How-to-not-
tokenize-HTML-tag-from-input-string-tf3190611.html#a8857238
Sent from the Lucene - Java Developer mailing list archive at
Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]