Donald Van den Driessche created CONNECTORS-1557:
----------------------------------------------------
Summary: HTML Tag extractor
Key: CONNECTORS-1557
URL: https://issues.apache.org/jira/browse/CONNECTORS-1557
Project: ManifoldCF
Issue Type: New Feature
Reporter: Donald Van den Driessche
I wrote a HTML Tag extractor, based on the HTML Extractor.
I needed to extract specific HTML tags and transfer them to their own field in
my output repository.
Input
* Englobing tag (CSS selector)
* Blacklist (CSS selector)
* Fieldmapping (CSS selector)
* Strip HTML
Process
* Retrieve Englobing tag
* Remove blacklist
* Map selected CSS selectors in Fieldmapping (arrays if multiple finds) +
strip HTML (if requested)
* Englobing tag minus blacklist: strip HTML (if requested) and return as
output (content)
How can I best deliver the source code?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)