would it be possible to implement a Analyser who filters HTML code out of a
HTML page. As a result I would have only the text free of any tagging.
The dilemma is that in a general sense there are multiple fields in HTML. At least "title" and "body", and perhaps others from metadata. An Analyzer operates only on a single field at a time so cannot split its input into multiple fields.
But, yes, it would be possible to strip bracketed text out. My recommendation for such would be to implement a Tokenizer to do this that could be used to feed an analysis process.
Erik
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
