[ 
https://issues.apache.org/jira/browse/NIFI-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417785#comment-15417785
 ] 

ASF GitHub Bot commented on NIFI-2524:
--------------------------------------

GitHub user YolandaMDavis opened a pull request:

    https://github.com/apache/nifi/pull/840

    NIFI-2524 - Fixes to improve handling of missing journal files 

    This fix focuses on preventing some of the side effects seen during 
provenance journal rollover outlined in NIFI-2524 after an OutOfMemoryError 
(which appeared to cause the removal of journal files from disk before they 
were merged).   The goals of this fix was to a) ensure that any information 
captured is rolled over and b) to ensure that if no files are available (or if 
an other error occurs during processing) that the system will a reasonable 
effort to retry the issue until it's resolved (yet not retry infinitely).  This 
pr include:
    
    - Removal of partial file check (based on missing first file) 
    -  Addition of conditional check to merge if at least one journal files 
available on disk. If all files are  missing from disk that is considered an 
error.
    -  Addition of retry logic to prevent endless thread execution when 
encountering errors (such as in missing all journal files). Retry is attempted 
5 times before the thread is cancelled.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/YolandaMDavis/nifi NIFI-2524

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/840.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #840
    
----
commit 3560bd7bb6a765a0d8a37257e00602a8a22348ed
Author: Yolanda M. Davis <[email protected]>
Date:   2016-08-11T09:42:19Z

    NIFI-2524 - Fixes to improve handling of missing journal files during 
rollover/merge execution.Includes:
    
     Removed partial file check (based on missing first file)
     Added condition to merge if at least one journal files available on disk. 
If all files are  missing from disk that is considered an error.
     Added retry logic to prevent endless thread execution when encountering 
errors (such as missing files).

----


> Unable to Merge Journal Files after OutOfMemory Error
> -----------------------------------------------------
>
>                 Key: NIFI-2524
>                 URL: https://issues.apache.org/jira/browse/NIFI-2524
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>            Reporter: Yolanda M. Davis
>            Assignee: Yolanda M. Davis
>             Fix For: 1.0.0
>
>         Attachments: nifi-app-1.log, nifi-app-2.log
>
>
> While running a flow that attempted to SplitText by eol an OutOfMemory Error 
> occurred (which wasn't surprising since this was a standalone instance with 
> no tuning for memory, backpressure, etc). 
> {noformat}
> ERROR [Timer-Driven Process Thread-9] o.a.nifi.processors.standard.SplitText 
> SplitText[id=6bb0fe36-0156-1000-795b-55cf4237a389] 
> SplitText[id=6bb0fe36-0156-1000-795b-55cf4237a389] failed to process due to 
> java.lang.OutOfMemoryError: Java heap space; rolling back session: 
> java.lang.OutOfMemoryError: Java heap space
> 2016-08-08 20:12:36,068 ERROR 
> [LeaseRenewer:[email protected]:8020] 
> org.apache.nifi.NiFi An Unknown Error Occurred in Thread 
> Thread[LeaseRenewer:[email protected]:8020,5,main]: 
> java.lang.OutOfMemoryError: Java heap space
> 2016-08-08 20:12:36,072 ERROR [Timer-Driven Process Thread-8] 
> o.a.n.p.PersistentProvenanceRepository Failed to persist Provenance Event due 
> to java.io.IOException: Stream Closed.
> 2016-08-08 20:12:55,111 ERROR 
> [LeaseRenewer:[email protected]:8020] 
> org.apache.nifi.NiFi 
> java.lang.OutOfMemoryError: Java heap space
>       at java.util.ArrayList.iterator(ArrayList.java:834) ~[na:1.8.0_101]
>       at 
> org.apache.hadoop.hdfs.LeaseRenewer.clientsRunning(LeaseRenewer.java:241) 
> ~[na:na]
>       at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:491) 
> ~[na:na]
>       at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71) 
> ~[na:na]
>       at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:304) 
> ~[na:na]
>       at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_101]
> 2016-08-08 20:12:58,022 ERROR [Timer-Driven Process Thread-9] 
> o.a.nifi.processors.standard.SplitText 
> java.lang.OutOfMemoryError: Java heap space
>       at java.util.HashMap.newNode(HashMap.java:1742) ~[na:1.8.0_101]
>       at java.util.HashMap.putVal(HashMap.java:630) ~[na:1.8.0_101]
>       at java.util.HashMap.putMapEntries(HashMap.java:514) ~[na:1.8.0_101]
>       at java.util.HashMap.putAll(HashMap.java:784) ~[na:1.8.0_101]
>       at 
> org.apache.nifi.controller.repository.StandardFlowFileRecord$Builder.fromFlowFile(StandardFlowFileRecord.java:307)
>  ~[nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.controller.repository.StandardProcessSession.putAttribute(StandardProcessSession.java:1462)
>  ~[nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.processors.standard.SplitText$1.process(SplitText.java:499) 
> ~[na:na]
>       at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1880)
>  ~[nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1851)
>  ~[nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.processors.standard.SplitText.onTrigger(SplitText.java:420) 
> ~[na:na]
>       at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
>  ~[nifi-api-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1060)
>  [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136)
>  [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
>  [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:127)
>  [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_101]
>       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_101]
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_101]
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_101]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_101]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_101]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
> 2016-08-08 20:13:04,246 ERROR [Timer-Driven Process Thread-8] 
> o.a.n.p.PersistentProvenanceRepository 
> java.io.IOException: Stream Closed
>       at java.io.FileOutputStream.writeBytes(Native Method) ~[na:1.8.0_101]
>       at java.io.FileOutputStream.write(FileOutputStream.java:326) 
> ~[na:1.8.0_101]
>       at 
> org.apache.nifi.stream.io.ByteCountingOutputStream.write(ByteCountingOutputStream.java:49)
>  ~[nifi-utils-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.stream.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:69)
>  ~[nifi-utils-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.stream.io.BufferedOutputStream.flush(BufferedOutputStream.java:126)
>  ~[nifi-utils-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.stream.io.ByteCountingOutputStream.flush(ByteCountingOutputStream.java:59)
>  ~[nifi-utils-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.stream.io.DataOutputStream.flush(DataOutputStream.java:104) 
> ~[nifi-utils-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.provenance.StandardRecordWriter.writeRecord(StandardRecordWriter.java:247)
>  ~[nifi-persistent-provenance-repository-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.provenance.PersistentProvenanceRepository.persistRecord(PersistentProvenanceRepository.java:744)
>  [nifi-persistent-provenance-repository-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.provenance.PersistentProvenanceRepository.registerEvent(PersistentProvenanceRepository.java:405)
>  [nifi-persistent-provenance-repository-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.controller.repository.StandardProvenanceReporter.send(StandardProvenanceReporter.java:203)
>  [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.controller.repository.StandardProvenanceReporter.send(StandardProvenanceReporter.java:173)
>  [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.processors.hadoop.PutHDFS.onTrigger(PutHDFS.java:345) 
> [nifi-hdfs-processors-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
>  [nifi-api-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1060)
>  [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136)
>  [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
>  [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:127)
>  [nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_101]
>       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_101]
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_101]
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_101]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_101]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_101]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
> 2016-08-08 20:13:34,028 ERROR [Timer-Driven Process Thread-9] 
> o.a.nifi.processors.standard.SplitText 
> SplitText[id=6bb0fe36-0156-1000-795b-55cf4237a389] 
> SplitText[id=6bb0fe36-0156-1000-795b-55cf4237a389] failed to process session 
> due to java.lang.OutOfMemoryError: Java heap space: 
> java.lang.OutOfMemoryError: Java heap space
> 2016-08-08 20:13:34,028 ERROR [Timer-Driven Process Thread-9] 
> o.a.nifi.processors.standard.SplitText 
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> However after stopping the offending processor yet allowing processors down 
> the line to continue processing (including putting into hdfs) NiFi reported 
> ProvenanceRepository warnings rooted in a FileNotFoundException:
> {noformat}
> 2016-08-08 20:28:31,358 WARN [Provenance Repository Rollover Thread-2] 
> o.a.n.p.PersistentProvenanceRepository Unable to merge 
> ./provenance_repository/journals/91137.journal.0 with other Journal Files due 
> to java.io.FileNotFoundException: Unable to locate file 
> ./provenance_repository/journals/91137.journal.0
> 2016-08-08 20:28:31,358 WARN [Provenance Repository Rollover Thread-2] 
> o.a.n.p.PersistentProvenanceRepository Unable to merge 
> ./provenance_repository/journals/91137.journal.1 with other Journal Files due 
> to java.io.FileNotFoundException: Unable to locate file 
> ./provenance_repository/journals/91137.journal.1
> 2016-08-08 20:28:31,358 WARN [Provenance Repository Rollover Thread-2] 
> o.a.n.p.PersistentProvenanceRepository Unable to merge 
> ./provenance_repository/journals/91137.journal.2 with other Journal Files due 
> to java.io.FileNotFoundException: Unable to locate file 
> ./provenance_repository/journals/91137.journal.2
> 2016-08-08 20:28:31,358 WARN [Provenance Repository Rollover Thread-2] 
> o.a.n.p.PersistentProvenanceRepository Unable to merge 
> ./provenance_repository/journals/91137.journal.3 with other Journal Files due 
> to java.io.FileNotFoundException: Unable to locate file 
> ./provenance_repository/journals/91137.journal.3
> 2016-08-08 20:28:31,358 WARN [Provenance Repository Rollover Thread-2] 
> o.a.n.p.PersistentProvenanceRepository Unable to merge 
> ./provenance_repository/journals/91137.journal.4 with other Journal Files due 
> to java.io.FileNotFoundException: Unable to locate file 
> ./provenance_repository/journals/91137.journal.4
> {noformat}
> These warnings persisted until NiFi was restarted completely.  Upon restart 
> the system started successfully yet appear to purge unknown files from the 
> ContentRepository (via archiving)
> {noformat}
> 2016-08-08 20:37:48,146 INFO [main] o.a.n.c.repository.FileSystemRepository 
> Found unknown file 
> /usr/nifi/nifi-1.0.0-BETA/content_repository/210/1470687051782-210 (1049769 
> bytes) in File System Repository; archiving file
> 2016-08-08 20:37:48,148 INFO [main] o.a.n.c.repository.FileSystemRepository 
> Found unknown file 
> /usr/nifi/nifi-1.0.0-BETA/content_repository/99/1470687045979-99 (1049310 
> bytes) in File System Repository; archiving file
> 2016-08-08 20:37:48,148 INFO [main] o.a.n.c.repository.FileSystemRepository 
> Found unknown file 
> /usr/nifi/nifi-1.0.0-BETA/content_repository/481/1470687065579-481 (1048861 
> bytes) in File System Repository; archiving file
> 2016-08-08 20:37:48,149 INFO [main] o.a.n.c.repository.FileSystemRepository 
> Found unknown file 
> /usr/nifi/nifi-1.0.0-BETA/content_repository/281/1470687054990-281 (1049854 
> bytes) in File System Repository; archiving file
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to