[
https://issues.apache.org/jira/browse/NIFI-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417785#comment-15417785
]
ASF GitHub Bot commented on NIFI-2524:
--------------------------------------
GitHub user YolandaMDavis opened a pull request:
https://github.com/apache/nifi/pull/840
NIFI-2524 - Fixes to improve handling of missing journal files
This fix focuses on preventing some of the side effects seen during
provenance journal rollover outlined in NIFI-2524 after an OutOfMemoryError
(which appeared to cause the removal of journal files from disk before they
were merged). The goals of this fix was to a) ensure that any information
captured is rolled over and b) to ensure that if no files are available (or if
an other error occurs during processing) that the system will a reasonable
effort to retry the issue until it's resolved (yet not retry infinitely). This
pr include:
- Removal of partial file check (based on missing first file)
- Addition of conditional check to merge if at least one journal files
available on disk. If all files are missing from disk that is considered an
error.
- Addition of retry logic to prevent endless thread execution when
encountering errors (such as in missing all journal files). Retry is attempted
5 times before the thread is cancelled.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/YolandaMDavis/nifi NIFI-2524
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/nifi/pull/840.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #840
----
commit 3560bd7bb6a765a0d8a37257e00602a8a22348ed
Author: Yolanda M. Davis <[email protected]>
Date: 2016-08-11T09:42:19Z
NIFI-2524 - Fixes to improve handling of missing journal files during
rollover/merge execution.Includes:
Removed partial file check (based on missing first file)
Added condition to merge if at least one journal files available on disk.
If all files are missing from disk that is considered an error.
Added retry logic to prevent endless thread execution when encountering
errors (such as missing files).
----
> Unable to Merge Journal Files after OutOfMemory Error
> -----------------------------------------------------
>
> Key: NIFI-2524
> URL: https://issues.apache.org/jira/browse/NIFI-2524
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Reporter: Yolanda M. Davis
> Assignee: Yolanda M. Davis
> Fix For: 1.0.0
>
> Attachments: nifi-app-1.log, nifi-app-2.log
>
>
> While running a flow that attempted to SplitText by eol an OutOfMemory Error
> occurred (which wasn't surprising since this was a standalone instance with
> no tuning for memory, backpressure, etc).
> {noformat}
> ERROR [Timer-Driven Process Thread-9] o.a.nifi.processors.standard.SplitText
> SplitText[id=6bb0fe36-0156-1000-795b-55cf4237a389]
> SplitText[id=6bb0fe36-0156-1000-795b-55cf4237a389] failed to process due to
> java.lang.OutOfMemoryError: Java heap space; rolling back session:
> java.lang.OutOfMemoryError: Java heap space
> 2016-08-08 20:12:36,068 ERROR
> [LeaseRenewer:[email protected]:8020]
> org.apache.nifi.NiFi An Unknown Error Occurred in Thread
> Thread[LeaseRenewer:[email protected]:8020,5,main]:
> java.lang.OutOfMemoryError: Java heap space
> 2016-08-08 20:12:36,072 ERROR [Timer-Driven Process Thread-8]
> o.a.n.p.PersistentProvenanceRepository Failed to persist Provenance Event due
> to java.io.IOException: Stream Closed.
> 2016-08-08 20:12:55,111 ERROR
> [LeaseRenewer:[email protected]:8020]
> org.apache.nifi.NiFi
> java.lang.OutOfMemoryError: Java heap space
> at java.util.ArrayList.iterator(ArrayList.java:834) ~[na:1.8.0_101]
> at
> org.apache.hadoop.hdfs.LeaseRenewer.clientsRunning(LeaseRenewer.java:241)
> ~[na:na]
> at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:491)
> ~[na:na]
> at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
> ~[na:na]
> at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:304)
> ~[na:na]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_101]
> 2016-08-08 20:12:58,022 ERROR [Timer-Driven Process Thread-9]
> o.a.nifi.processors.standard.SplitText
> java.lang.OutOfMemoryError: Java heap space
> at java.util.HashMap.newNode(HashMap.java:1742) ~[na:1.8.0_101]
> at java.util.HashMap.putVal(HashMap.java:630) ~[na:1.8.0_101]
> at java.util.HashMap.putMapEntries(HashMap.java:514) ~[na:1.8.0_101]
> at java.util.HashMap.putAll(HashMap.java:784) ~[na:1.8.0_101]
> at
> org.apache.nifi.controller.repository.StandardFlowFileRecord$Builder.fromFlowFile(StandardFlowFileRecord.java:307)
> ~[nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.controller.repository.StandardProcessSession.putAttribute(StandardProcessSession.java:1462)
> ~[nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.processors.standard.SplitText$1.process(SplitText.java:499)
> ~[na:na]
> at
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1880)
> ~[nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1851)
> ~[nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.processors.standard.SplitText.onTrigger(SplitText.java:420)
> ~[na:na]
> at
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> ~[nifi-api-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1060)
> [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136)
> [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
> [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:127)
> [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_101]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> [na:1.8.0_101]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> [na:1.8.0_101]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> [na:1.8.0_101]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_101]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_101]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
> 2016-08-08 20:13:04,246 ERROR [Timer-Driven Process Thread-8]
> o.a.n.p.PersistentProvenanceRepository
> java.io.IOException: Stream Closed
> at java.io.FileOutputStream.writeBytes(Native Method) ~[na:1.8.0_101]
> at java.io.FileOutputStream.write(FileOutputStream.java:326)
> ~[na:1.8.0_101]
> at
> org.apache.nifi.stream.io.ByteCountingOutputStream.write(ByteCountingOutputStream.java:49)
> ~[nifi-utils-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.stream.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:69)
> ~[nifi-utils-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.stream.io.BufferedOutputStream.flush(BufferedOutputStream.java:126)
> ~[nifi-utils-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.stream.io.ByteCountingOutputStream.flush(ByteCountingOutputStream.java:59)
> ~[nifi-utils-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.stream.io.DataOutputStream.flush(DataOutputStream.java:104)
> ~[nifi-utils-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.provenance.StandardRecordWriter.writeRecord(StandardRecordWriter.java:247)
> ~[nifi-persistent-provenance-repository-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.provenance.PersistentProvenanceRepository.persistRecord(PersistentProvenanceRepository.java:744)
> [nifi-persistent-provenance-repository-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.provenance.PersistentProvenanceRepository.registerEvent(PersistentProvenanceRepository.java:405)
> [nifi-persistent-provenance-repository-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.controller.repository.StandardProvenanceReporter.send(StandardProvenanceReporter.java:203)
> [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.controller.repository.StandardProvenanceReporter.send(StandardProvenanceReporter.java:173)
> [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.processors.hadoop.PutHDFS.onTrigger(PutHDFS.java:345)
> [nifi-hdfs-processors-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> [nifi-api-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1060)
> [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136)
> [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
> [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
> at
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:127)
> [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_101]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> [na:1.8.0_101]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> [na:1.8.0_101]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> [na:1.8.0_101]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_101]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_101]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
> 2016-08-08 20:13:34,028 ERROR [Timer-Driven Process Thread-9]
> o.a.nifi.processors.standard.SplitText
> SplitText[id=6bb0fe36-0156-1000-795b-55cf4237a389]
> SplitText[id=6bb0fe36-0156-1000-795b-55cf4237a389] failed to process session
> due to java.lang.OutOfMemoryError: Java heap space:
> java.lang.OutOfMemoryError: Java heap space
> 2016-08-08 20:13:34,028 ERROR [Timer-Driven Process Thread-9]
> o.a.nifi.processors.standard.SplitText
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> However after stopping the offending processor yet allowing processors down
> the line to continue processing (including putting into hdfs) NiFi reported
> ProvenanceRepository warnings rooted in a FileNotFoundException:
> {noformat}
> 2016-08-08 20:28:31,358 WARN [Provenance Repository Rollover Thread-2]
> o.a.n.p.PersistentProvenanceRepository Unable to merge
> ./provenance_repository/journals/91137.journal.0 with other Journal Files due
> to java.io.FileNotFoundException: Unable to locate file
> ./provenance_repository/journals/91137.journal.0
> 2016-08-08 20:28:31,358 WARN [Provenance Repository Rollover Thread-2]
> o.a.n.p.PersistentProvenanceRepository Unable to merge
> ./provenance_repository/journals/91137.journal.1 with other Journal Files due
> to java.io.FileNotFoundException: Unable to locate file
> ./provenance_repository/journals/91137.journal.1
> 2016-08-08 20:28:31,358 WARN [Provenance Repository Rollover Thread-2]
> o.a.n.p.PersistentProvenanceRepository Unable to merge
> ./provenance_repository/journals/91137.journal.2 with other Journal Files due
> to java.io.FileNotFoundException: Unable to locate file
> ./provenance_repository/journals/91137.journal.2
> 2016-08-08 20:28:31,358 WARN [Provenance Repository Rollover Thread-2]
> o.a.n.p.PersistentProvenanceRepository Unable to merge
> ./provenance_repository/journals/91137.journal.3 with other Journal Files due
> to java.io.FileNotFoundException: Unable to locate file
> ./provenance_repository/journals/91137.journal.3
> 2016-08-08 20:28:31,358 WARN [Provenance Repository Rollover Thread-2]
> o.a.n.p.PersistentProvenanceRepository Unable to merge
> ./provenance_repository/journals/91137.journal.4 with other Journal Files due
> to java.io.FileNotFoundException: Unable to locate file
> ./provenance_repository/journals/91137.journal.4
> {noformat}
> These warnings persisted until NiFi was restarted completely. Upon restart
> the system started successfully yet appear to purge unknown files from the
> ContentRepository (via archiving)
> {noformat}
> 2016-08-08 20:37:48,146 INFO [main] o.a.n.c.repository.FileSystemRepository
> Found unknown file
> /usr/nifi/nifi-1.0.0-BETA/content_repository/210/1470687051782-210 (1049769
> bytes) in File System Repository; archiving file
> 2016-08-08 20:37:48,148 INFO [main] o.a.n.c.repository.FileSystemRepository
> Found unknown file
> /usr/nifi/nifi-1.0.0-BETA/content_repository/99/1470687045979-99 (1049310
> bytes) in File System Repository; archiving file
> 2016-08-08 20:37:48,148 INFO [main] o.a.n.c.repository.FileSystemRepository
> Found unknown file
> /usr/nifi/nifi-1.0.0-BETA/content_repository/481/1470687065579-481 (1048861
> bytes) in File System Repository; archiving file
> 2016-08-08 20:37:48,149 INFO [main] o.a.n.c.repository.FileSystemRepository
> Found unknown file
> /usr/nifi/nifi-1.0.0-BETA/content_repository/281/1470687054990-281 (1049854
> bytes) in File System Repository; archiving file
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)