Hi all,
I have a vanilla Nifi 1.2.0 node with 1GB of heap.
The flow I am trying to run is:
ListHDFS -> FetchHDFS -> SplitText -> RouteOnContent -> MergeContent ->
PutHDFS
When I give it a 300MB input zip file (2.5GB uncompressed) I am getting
Java OutOfMemoryError as below.
Does NiFi read in the entire contents of files in memory? This is
unexpected. I thought it is chunking through files. Giving more ram is not
a solution as you can always get larger input files in the future.
Does this mean NiFi is not suitable as a scalable ETL solution?
Can someone please explain what is happening and how to mitigate large
files in NiFi? Any patterns?
Thanks,
M
ERROR [Timer-Driven Process Thread-9]
o.a.nifi.processors.standard.SplitText
SplitText[id=e16939ca-f28f-1178-b66e-054e43a0a724]
SplitText[id=e16939ca-f28f-1178-b66e-054e43a0a724] failed to process
session due to java.lang.OutOfMemoryError: Java heap space: {}
java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap$EntrySet.iterator(HashMap.java:1013)
at java.util.HashMap.putMapEntries(HashMap.java:511)
at java.util.HashMap.<init>(HashMap.java:489)
at
org.apache.nifi.controller.repository.StandardFlowFileRecord$Builder.initializeAttributes(StandardFlowFileRecord.java:219)
at
org.apache.nifi.controller.repository.StandardFlowFileRecord$Builder.addAttributes(StandardFlowFileRecord.java:234)
at
org.apache.nifi.controller.repository.StandardProcessSession.putAllAttributes(StandardProcessSession.java:1723)
at
org.apache.nifi.processors.standard.SplitText.updateAttributes(SplitText.java:367)
at
org.apache.nifi.processors.standard.SplitText.generateSplitFlowFiles(SplitText.java:320)
at
org.apache.nifi.processors.standard.SplitText.onTrigger(SplitText.java:258)
at
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1118)
at
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:144)
at
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
at
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)