Re: HTML tag filter...

Erik Hatcher Sat, 10 Jan 2004 13:33:36 -0800

On Jan 10, 2004, at 1:43 PM, [EMAIL PROTECTED] wrote:

would it be possible to implement a Analyser who filters HTML code out of a HTML page. As a result I would have only the text free of any tagging.

The dilemma is that in a general sense there are multiple fields in HTML. At least "title" and "body", and perhaps others from metadata. An Analyzer operates only on a single field at a time so cannot split its input into multiple fields.

But, yes, it would be possible to strip bracketed text out. My recommendation for such would be to implement a Tokenizer to do this that could be used to feed an analysis process.

Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: HTML tag filter...

Reply via email to