[
https://issues.apache.org/jira/browse/NIFI-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007716#comment-15007716
]
ASF GitHub Bot commented on NIFI-1156:
--------------------------------------
Github user olegz commented on a diff in the pull request:
https://github.com/apache/nifi/pull/124#discussion_r45008293
--- Diff: nifi-nar-bundles/nifi-html-bundle/nifi-html-processors/pom.xml ---
@@ -0,0 +1,59 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+ http://www.apache.org/licenses/LICENSE-2.0
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
+ <modelVersion>4.0.0</modelVersion>
+
+ <parent>
+ <groupId>org.apache.nifi</groupId>
+ <artifactId>nifi-html-bundle</artifactId>
+ <version>0.4.0-SNAPSHOT</version>
+ </parent>
+
+ <artifactId>nifi-html-processors</artifactId>
+ <description>Support for parsing HTML documents</description>
+
+ <dependencies>
+ <dependency>
+ <groupId>org.jsoup</groupId>
+ <artifactId>jsoup</artifactId>
+ <version>1.8.3</version>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.nifi</groupId>
+ <artifactId>nifi-api</artifactId>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.nifi</groupId>
+ <artifactId>nifi-processor-utils</artifactId>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.nifi</groupId>
+ <artifactId>nifi-mock</artifactId>
+ <scope>test</scope>
+ </dependency>
+ <dependency>
+ <groupId>org.slf4j</groupId>
+ <artifactId>slf4j-simple</artifactId>
+ <scope>test</scope>
+ </dependency>
+ <dependency>
+ <groupId>junit</groupId>
+ <artifactId>junit</artifactId>
+ <version>4.11</version>
--- End diff --
Does the main POM declares a version of JUnit 4.12? Can we use that one?
> HTML Parsing Processors Bundle
> ------------------------------
>
> Key: NIFI-1156
> URL: https://issues.apache.org/jira/browse/NIFI-1156
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Core Framework
> Reporter: Jeremy Dyer
> Priority: Minor
>
> NiFi provides the ability to ingest HTML but lacks the convenience to easily
> interact with that HTML once it has entered the flow. There should be a HTML
> Processing Bundle that provides mechanisms for manipulating and interacting
> with HTML data once it has entered the flow. Jsoup http://jsoup.org/ seems
> like a logical tool to use since it is mature and has a MIT license which
> would allow it to be incorporated into NiFi.
> “GetHTMLElement” should use the CSS selector-syntax
> (http://www.w3schools.com/cssref/css_selectors.asp) built into Jsoup to
> extract 0-N HTML elements from the original HTML input. This processor should
> support a delimited string of selectors allowing the user to build compound
> HTML element output. Each HTML element (or compound element result) extracted
> will create a new Flowfile where the element will be in either the Flowfile
> content or an attribute depending on the user configuration.
> “ModifyHTMLElement” should provide the ability to modify the original input
> HTML and overwrite any existing element values. The HTML element that will be
> modified can be selected by using the CSS selector-syntax
> “PutHTMLElement” should provide the ability to put a new HTML element
> anywhere in the original input HTML using CSS selector-syntax to indicate the
> position that the new HTML element should be placed.
> There seems to be a potential for adding more processors but this seems like
> a good start. Since there is a dependency on Jsoup and a potential for more
> processors to come I think it makes sense to add this logic as its own nar
> bundle but I could be wrong.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)