Donald Van den Driessche created CONNECTORS-1550:
----------------------------------------------------

             Summary: HTML Tag mapping
                 Key: CONNECTORS-1550
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1550
             Project: ManifoldCF
          Issue Type: Wish
          Components: Elastic Search connector, Tika extractor, Web connector
    Affects Versions: ManifoldCF 2.10
            Reporter: Donald Van den Driessche


I’ll be crawling a website with the standard Web connecter. I want to extract 
just certain html tags like <h1>, <h2> and <p>. 
I’ve set up an HTML extractor transformation connector and the internal Tika 
transformation connector. But I can’t find any place to do a mapping to the 
output for this.
 
Do I have to write my own transformation connector to extract the content of 
these tags? Or is there a built in solution?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to