Hi all,

I have a vanilla Nifi 1.2.0 node with 1GB of heap.

The flow I am trying to run is:
ListHDFS -> FetchHDFS -> SplitText -> RouteOnContent -> MergeContent ->
PutHDFS

When I give it a 300MB input zip file (2.5GB uncompressed) I am getting
Java OutOfMemoryError as below.

Does NiFi read in the entire contents of files in memory? This is
unexpected. I thought it is chunking through files. Giving more ram is not
a solution as you can always get larger input files in the future.

Does this mean NiFi is not suitable as a scalable ETL solution?

Can someone please explain what is happening and how to mitigate large
files in NiFi? Any patterns?

Thanks,
M

ERROR [Timer-Driven Process Thread-9]
o.a.nifi.processors.standard.SplitText
SplitText[id=e16939ca-f28f-1178-b66e-054e43a0a724]
SplitText[id=e16939ca-f28f-1178-b66e-054e43a0a724] failed to process
session due to java.lang.OutOfMemoryError: Java heap space: {}

java.lang.OutOfMemoryError: Java heap space

        at java.util.HashMap$EntrySet.iterator(HashMap.java:1013)

        at java.util.HashMap.putMapEntries(HashMap.java:511)

        at java.util.HashMap.<init>(HashMap.java:489)

        at
org.apache.nifi.controller.repository.StandardFlowFileRecord$Builder.initializeAttributes(StandardFlowFileRecord.java:219)

        at
org.apache.nifi.controller.repository.StandardFlowFileRecord$Builder.addAttributes(StandardFlowFileRecord.java:234)

        at
org.apache.nifi.controller.repository.StandardProcessSession.putAllAttributes(StandardProcessSession.java:1723)

        at
org.apache.nifi.processors.standard.SplitText.updateAttributes(SplitText.java:367)

        at
org.apache.nifi.processors.standard.SplitText.generateSplitFlowFiles(SplitText.java:320)

        at
org.apache.nifi.processors.standard.SplitText.onTrigger(SplitText.java:258)

        at
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)

        at
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1118)

        at
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:144)

        at
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)

        at
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)

        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:748)

Reply via email to