[
https://issues.apache.org/jira/browse/NIFI-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080531#comment-15080531
]
Joseph Witt commented on NIFI-1156:
-----------------------------------
Hello [~jeremy.dyer] Thanks for this contrib. We will need to update it to
include the MIT license reference in the nifi-assembly/LICENSE and perhaps some
transitive deps too (haven't looked yet). The contrib and tests all look
great. I can see how this may be useful particularly in combination with the
Http Request/Response processors for example. But, would like to see more
documentation added for the processor description as that shows up in the end
user documentation. Also, do you have a template or use case in mind you can
share? Would be good to round this out with more detailed information for the
end users.
Thanks
Joe
> HTML Parsing Processors Bundle
> ------------------------------
>
> Key: NIFI-1156
> URL: https://issues.apache.org/jira/browse/NIFI-1156
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Core Framework
> Reporter: Jeremy Dyer
> Priority: Minor
>
> NiFi provides the ability to ingest HTML but lacks the convenience to easily
> interact with that HTML once it has entered the flow. There should be a HTML
> Processing Bundle that provides mechanisms for manipulating and interacting
> with HTML data once it has entered the flow. Jsoup http://jsoup.org/ seems
> like a logical tool to use since it is mature and has a MIT license which
> would allow it to be incorporated into NiFi.
> “GetHTMLElement” should use the CSS selector-syntax
> (http://www.w3schools.com/cssref/css_selectors.asp) built into Jsoup to
> extract 0-N HTML elements from the original HTML input. This processor should
> support a delimited string of selectors allowing the user to build compound
> HTML element output. Each HTML element (or compound element result) extracted
> will create a new Flowfile where the element will be in either the Flowfile
> content or an attribute depending on the user configuration.
> “ModifyHTMLElement” should provide the ability to modify the original input
> HTML and overwrite any existing element values. The HTML element that will be
> modified can be selected by using the CSS selector-syntax
> “PutHTMLElement” should provide the ability to put a new HTML element
> anywhere in the original input HTML using CSS selector-syntax to indicate the
> position that the new HTML element should be placed.
> There seems to be a potential for adding more processors but this seems like
> a good start. Since there is a dependency on Jsoup and a potential for more
> processors to come I think it makes sense to add this logic as its own nar
> bundle but I could be wrong.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)