[ 
https://issues.apache.org/jira/browse/NIFI-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470429#comment-16470429
 ] 

Mark Payne commented on NIFI-5177:
----------------------------------

[~AmitC15] I would recommend that you update your instances to run the 
WriteAheadProvenanceRepository instead of the PersitentProvenanceRepository. 
You can do this by updating the "nifi.provenance.repository.implementation" 
property in conf/nifi.properties from 
"org.apache.nifi.provenance.PersistentProvenanceRepository" to 
"org.apache.nifi.provenance.WriteAheadProvenanceRepository". Also, of note, if 
you are running Java 8 and using the Garbage First Garbage Collector (G1GC) 
then you'll probably want to disable that because there are known bugs in JDK 8 
that can cause segmentation faults with Memory Mapped Files. While this can 
occur in PersistentProvenanceRepository as well, it seems to happen more often 
with the WriteAheadProvenanceRepository. To check/modify this, look at 
conf/bootstrap.conf. If you see the line "java.arg.13=-XX:+UseG1GC" then you 
should comment that out.

The WriteAheadProvenanceRepository is much newer. It's known to be more stable 
and is far faster than the PersistentProvenanceRepository. I have actually 
created a Jira (NIFI-5181) to update the default to use 
WriteAheadProvenanceRepository. I suspect we will subsequently deprecated the 
PersistentProvenanceRepository and stop maintaining it.

Also of note, if you change to the WriteAheadProvenanceRepository, it will 
honor the provenance data that was stored in the Persistent Provenance 
Repository, so the migration should be painless.

> Failed to merge Journal Files leads to LockObtainFailedException: Lock obtain 
> timed out exception
> -------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-5177
>                 URL: https://issues.apache.org/jira/browse/NIFI-5177
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.5.0
>            Reporter: AmitC15
>            Priority: Critical
>
> NiFI version: 1.5
> Cluster setup +  external zookeeper on each one of them.
> Log: 
> [ Date ] 2018-05-08 15:53:12,193 [ Priority ] ERROR [ Text 3 ] [Provenance 
> Repository Rollover Thread-1] o.a.n.p.PersistentProvenanceRepository 
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> [NativeFSLock@/nifi/nifi-1.5.0/provenance_repository/index-1524294029000/write.lock|mailto:NativeFSLock@/nifi/nifi-1.5.0/provenance_repository/index-1524294029000/write.lock]
>  at org.apache.lucene.store.Lock.obtain(Lock.java:89) at 
> org.apache.lucene.index.IndexWriter.(IndexWriter.java:755) at 
> org.apache.nifi.provenance.lucene.SimpleIndexManager.createWriter(SimpleIndexManager.java:198)
>  at 
> org.apache.nifi.provenance.lucene.SimpleIndexManager.borrowIndexWriter(SimpleIndexManager.java:227)
>  at 
> org.apache.nifi.provenance.PersistentProvenanceRepository.mergeJournals(PersistentProvenanceRepository.java:1712)
>  at 
> org.apache.nifi.provenance.PersistentProvenanceRepository$8.run(PersistentProvenanceRepository.java:1300)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Happened twice this week on 2 different environments.
> After effects:   
>  * specific node disconnects from cluster (requires restart)
>  * UI not accessible from all nodes.
>  * Also led once to a different issue -  failed to connect node to cluster 
> due to: java.lang.IllegalStateException: Signaled to end recovery, but there 
> are more recovery files for Partition in directory
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to