Andy LoPresto created NIFI-6999:
-----------------------------------

             Summary: Encrypt Config Toolkit fails on very large flow.xml.gz 
files
                 Key: NIFI-6999
                 URL: https://issues.apache.org/jira/browse/NIFI-6999
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Tools and Build
    Affects Versions: 1.10.0, 1.2.0
            Reporter: Andy LoPresto
            Assignee: Andy LoPresto


A user reported failure when using the encrypt config toolkit to process 
(encrypt) a large {{flow.xml.gz}}. The compressed file was 49 MB, but was 687 
MB uncompressed. It contained 545 encrypted values, and approximately 90 
templates. This caused the toolkit to fail during {{loadFlowXml()}} unless the 
toolkit invocation set the heap to 8 GB via {{-Xms2g -Xmx8g}}. Even with the 
expanded heap, the serialization of the newly-encrypted flow XML to the file 
system fails with the following exception:

{code}
Exception in thread "main" java.lang.OutOfMemoryError: Requested array size 
exceeds VM limit
at java.lang.StringCoding.encode(StringCoding.java:350)
at java.lang.String.getBytes(String.java:941)
at org.apache.commons.io.IOUtils.write(IOUtils.java:1857)
at org.apache.commons.io.IOUtils$write$0.call(Unknown Source)
at 
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:141)
at 
org.apache.nifi.properties.ConfigEncryptionTool$_writeFlowXmlToFile_closure5$_closure20.doCall(ConfigEncryptionTool.groovy:692)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at 
org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1019)
at groovy.lang.Closure.call(Closure.java:426)
at groovy.lang.Closure.call(Closure.java:442)
at 
org.codehaus.groovy.runtime.IOGroovyMethods.withCloseable(IOGroovyMethods.java:1622)
at 
org.codehaus.groovy.runtime.NioGroovyMethods.withCloseable(NioGroovyMethods.java:1754)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.codehaus.groovy.runtime.metaclass.ReflectionMetaMethod.invoke(ReflectionMetaMethod.java:54)
at 
org.codehaus.groovy.runtime.metaclass.NewInstanceMetaMethod.invoke(NewInstanceMetaMethod.java:56)
at 
org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274)
at 
org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56)
at 
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
at 
org.apache.nifi.properties.ConfigEncryptionTool$_writeFlowXmlToFile_closure5.doCall(ConfigEncryptionTool.groovy:691)
{code}

The immediate fix was to remove the duplicated template definitions in the flow 
definition, returning the file to a reasonable size. However, if run as an 
inline replacement, this can cause the {{flow.xml.gz}} to be overwritten with 
an empty file, potentially leading to data loss. The following steps should be 
taken:

# Guard against loading/operating on/serializing large files (log statements, 
simple conditional checks)
# Handle large files internally (change from direct {{String}} access to 
{{BufferedInputStream}}, etc.)
# Document the internal memory usage of the toolkit in the toolkit guide
# Document best practices and steps to resolve issue in the toolkit guide



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to