exceptionfactory commented on a change in pull request #5324:
URL: https://github.com/apache/nifi/pull/5324#discussion_r785171728



##########
File path: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateXml.java
##########
@@ -64,26 +72,40 @@
     @WritesAttribute(attribute = "validatexml.invalid.error", description = 
"If the flow file is routed to the invalid relationship "
             + "the attribute will contain the error message resulting from the 
validation failure.")
 })
-@CapabilityDescription("Validates the contents of FlowFiles against a 
user-specified XML Schema file")
+@CapabilityDescription("Validates XML contained in a FlowFile. By default, the 
XML is contained in the FlowFile content. If the 'XML Source Attribute' 
property is set, the XML to be validated "
+        + "is contained in the specified attribute. It is not recommended to 
use attributes to hold large XML documents; doing so could adversely affect 
system performance. "
+        + "Full schema validation is performed if the processor is configured 
with the XSD schema details. Otherwise, the only validation performed is "
+        + "to ensure the XML syntax is correct and well-formed, e.g. all 
opening tags are properly closed.")
+@SystemResourceConsideration(resource = SystemResource.MEMORY, description = 
"While this processor supports processing XML within attributes, it is strongly 
discouraged to hold "
+        + "large amounts of data in attributes. In general, attribute values 
should be as small as possible and hold no more than a couple hundred 
characters.")
 public class ValidateXml extends AbstractProcessor {
 
     public static final String ERROR_ATTRIBUTE_KEY = 
"validatexml.invalid.error";
 
     public static final PropertyDescriptor SCHEMA_FILE = new 
PropertyDescriptor.Builder()
             .name("Schema File")
-            .description("The path to the Schema file that is to be used for 
validation")
-            .required(true)
+            .displayName("Schema File")
+            .description("The file path or URL to the XSD Schema file that is 
to be used for validation. If this property is blank, only XML syntax/structure 
will be validated.")
+            .required(false)
             
.expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY)
             .identifiesExternalResource(ResourceCardinality.SINGLE, 
ResourceType.FILE, ResourceType.URL)
             .build();
+    public static final PropertyDescriptor XML_SOURCE_ATTRIBUTE = new 
PropertyDescriptor.Builder()
+            .name("XML Source Attribute")

Review comment:
       Rather than making this property an attribute name, what do you think 
about changing it to `Schema` and supporting expression language using FlowFile 
attributes?  That would allow direct configuration of the Schema as a processor 
property, while also giving the option for FlowFile attribute based validation.

##########
File path: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateXml.java
##########
@@ -134,32 +157,73 @@ public void onTrigger(final ProcessContext context, final 
ProcessSession session
         }
 
         final Schema schema = schemaRef.get();
-        final Validator validator = schema.newValidator();
+        final Validator validator = schema == null ? null : 
schema.newValidator();
         final ComponentLog logger = getLogger();
+        final boolean attributeContainsXML = 
context.getProperty(XML_SOURCE_ATTRIBUTE).isSet();
 
         for (FlowFile flowFile : flowFiles) {
             final AtomicBoolean valid = new AtomicBoolean(true);
-            final AtomicReference<Exception> exception = new 
AtomicReference<Exception>(null);
-
-            session.read(flowFile, new InputStreamCallback() {
-                @Override
-                public void process(final InputStream in) throws IOException {
-                    try {
-                        validator.validate(new StreamSource(in));
-                    } catch (final IllegalArgumentException | SAXException e) {
-                        valid.set(false);
-                        exception.set(e);
+            final AtomicReference<Exception> exception = new 
AtomicReference<>(null);
+            SafeXMLConfiguration safeXMLConfiguration = new 
SafeXMLConfiguration();
+            safeXMLConfiguration.setValidating(false);
+
+            try {
+                DocumentBuilder docBuilder = 
safeXMLConfiguration.createDocumentBuilder();
+
+                if (attributeContainsXML) {
+                    // If XML source attribute is set, validate attribute value
+                    String xml = 
flowFile.getAttribute(context.getProperty(XML_SOURCE_ATTRIBUTE).evaluateAttributeExpressions().getValue());
+                    ByteArrayInputStream bais = new 
ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8));
+
+                    if (validator != null) {
+                        // If schema is provided, validator will be non-null
+                        validator.validate(new StreamSource(bais));
+                    } else {
+                        // Only verify that the XML is well-formed; no schema 
check
+                        docBuilder.parse(bais);
                     }

Review comment:
       Since this logic is essentially the same as the logic in the callback in 
the `else` block, it should be possible to create a shared function that takes 
an `InputStream`, which could then be called as needed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to