[ https://issues.apache.org/jira/browse/NIFI-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612604#comment-16612604 ]
Brandon DeVries commented on NIFI-5534: --------------------------------------- [~paulvid3], this is interesting, but I think it might be more appropriate to limit your processor to simply the extraction portion. In other words, leave getting the HTML to be parsed to GetHTTP / InvokeHTTP / whatever, and simply operate on the contents of the FlowFile coming in to your processor... > Create a Nifi Processor using Boilerpipe Article Extractor > ---------------------------------------------------------- > > Key: NIFI-5534 > URL: https://issues.apache.org/jira/browse/NIFI-5534 > Project: Apache NiFi > Issue Type: New Feature > Reporter: Paul Vidal > Priority: Minor > Labels: github-import > Original Estimate: 24h > Remaining Estimate: 24h > > Using the boilerpipe library ([https://code.google.com/archive/p/boilerpipe/] > ), I created a simple processor that reads the content of a URL and extract > its text into a flowfile. > I think it is a good complement to the HMTL nar bundle. > > Link to my implementation: > https://github.com/paulvid/nifi/tree/NIFI-5534/nifi-nar-bundles/nifi-html-bundle/nifi-html-processors/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)