exceptionfactory commented on a change in pull request #5324:
URL: https://github.com/apache/nifi/pull/5324#discussion_r785171728
##########
File path:
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateXml.java
##########
@@ -64,26 +72,40 @@
@WritesAttribute(attribute = "validatexml.invalid.error", description =
"If the flow file is routed to the invalid relationship "
+ "the attribute will contain the error message resulting from the
validation failure.")
})
-@CapabilityDescription("Validates the contents of FlowFiles against a
user-specified XML Schema file")
+@CapabilityDescription("Validates XML contained in a FlowFile. By default, the
XML is contained in the FlowFile content. If the 'XML Source Attribute'
property is set, the XML to be validated "
+ + "is contained in the specified attribute. It is not recommended to
use attributes to hold large XML documents; doing so could adversely affect
system performance. "
+ + "Full schema validation is performed if the processor is configured
with the XSD schema details. Otherwise, the only validation performed is "
+ + "to ensure the XML syntax is correct and well-formed, e.g. all
opening tags are properly closed.")
+@SystemResourceConsideration(resource = SystemResource.MEMORY, description =
"While this processor supports processing XML within attributes, it is strongly
discouraged to hold "
+ + "large amounts of data in attributes. In general, attribute values
should be as small as possible and hold no more than a couple hundred
characters.")
public class ValidateXml extends AbstractProcessor {
public static final String ERROR_ATTRIBUTE_KEY =
"validatexml.invalid.error";
public static final PropertyDescriptor SCHEMA_FILE = new
PropertyDescriptor.Builder()
.name("Schema File")
- .description("The path to the Schema file that is to be used for
validation")
- .required(true)
+ .displayName("Schema File")
+ .description("The file path or URL to the XSD Schema file that is
to be used for validation. If this property is blank, only XML syntax/structure
will be validated.")
+ .required(false)
.expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY)
.identifiesExternalResource(ResourceCardinality.SINGLE,
ResourceType.FILE, ResourceType.URL)
.build();
+ public static final PropertyDescriptor XML_SOURCE_ATTRIBUTE = new
PropertyDescriptor.Builder()
+ .name("XML Source Attribute")
Review comment:
Rather than making this property an attribute name, what do you think
about changing it to `Schema` and supporting expression language using FlowFile
attributes? That would allow direct configuration of the Schema as a processor
property, while also giving the option for FlowFile attribute based validation.
##########
File path:
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateXml.java
##########
@@ -134,32 +157,73 @@ public void onTrigger(final ProcessContext context, final
ProcessSession session
}
final Schema schema = schemaRef.get();
- final Validator validator = schema.newValidator();
+ final Validator validator = schema == null ? null :
schema.newValidator();
final ComponentLog logger = getLogger();
+ final boolean attributeContainsXML =
context.getProperty(XML_SOURCE_ATTRIBUTE).isSet();
for (FlowFile flowFile : flowFiles) {
final AtomicBoolean valid = new AtomicBoolean(true);
- final AtomicReference<Exception> exception = new
AtomicReference<Exception>(null);
-
- session.read(flowFile, new InputStreamCallback() {
- @Override
- public void process(final InputStream in) throws IOException {
- try {
- validator.validate(new StreamSource(in));
- } catch (final IllegalArgumentException | SAXException e) {
- valid.set(false);
- exception.set(e);
+ final AtomicReference<Exception> exception = new
AtomicReference<>(null);
+ SafeXMLConfiguration safeXMLConfiguration = new
SafeXMLConfiguration();
+ safeXMLConfiguration.setValidating(false);
+
+ try {
+ DocumentBuilder docBuilder =
safeXMLConfiguration.createDocumentBuilder();
+
+ if (attributeContainsXML) {
+ // If XML source attribute is set, validate attribute value
+ String xml =
flowFile.getAttribute(context.getProperty(XML_SOURCE_ATTRIBUTE).evaluateAttributeExpressions().getValue());
+ ByteArrayInputStream bais = new
ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8));
+
+ if (validator != null) {
+ // If schema is provided, validator will be non-null
+ validator.validate(new StreamSource(bais));
+ } else {
+ // Only verify that the XML is well-formed; no schema
check
+ docBuilder.parse(bais);
}
Review comment:
Since this logic is essentially the same as the logic in the callback in
the `else` block, it should be possible to create a shared function that takes
an `InputStream`, which could then be called as needed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]