[ 
https://issues.apache.org/jira/browse/NIFI-7856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203535#comment-17203535
 ] 

Mengze Li commented on NIFI-7856:
---------------------------------

Here is the stack trace of one incident, hopefully it is helpful.  
Also attached the ls results, it seems that these files are all compressed fine 
but the logs seem to show that it doesn't exist. 
A race condition?

{code}
2020-09-27 21:37:34,747 INFO [Clustering Tasks Thread-3] 
o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2020-09-27 
21:37:34,616 and sent to 10.51.8.18:9999 at 2020-09-27 21:37:34,747; send took 
131 millis
2020-09-27 21:37:39,660 INFO [pool-15-thread-1] 
o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile 
Repository
2020-09-27 21:37:39,660 INFO [pool-15-thread-1] 
o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile 
Repository with 15079 records in 0 milliseconds
2020-09-27 21:37:49,109 INFO [pool-61-thread-1] 
c.a.s.k.clientlibrary.lib.worker.Worker Current stream shard assignments: 
shardId-000000000000
2020-09-27 21:37:49,110 INFO [pool-61-thread-1] 
c.a.s.k.clientlibrary.lib.worker.Worker Sleeping ...
2020-09-27 21:37:59,660 INFO [pool-15-thread-1] 
o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile 
Repository
2020-09-27 21:37:59,660 INFO [pool-15-thread-1] 
o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile 
Repository with 15079 records in 0 milliseconds
2020-09-27 21:38:02,196 INFO [pool-43-thread-1] 
c.a.s.k.clientlibrary.lib.worker.Worker Current stream shard assignments: 
shardId-000000000012
2020-09-27 21:38:02,196 INFO [pool-43-thread-1] 
c.a.s.k.clientlibrary.lib.worker.Worker Sleeping ...
2020-09-27 21:38:19,660 INFO [pool-15-thread-1] 
o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile 
Repository
2020-09-27 21:38:19,660 INFO [pool-15-thread-1] 
o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile 
Repository with 15079 records in 0 milliseconds
2020-09-27 21:38:20,688 INFO [Timer-Driven Process Thread-6] 
o.a.nifi.groups.StandardProcessGroup 
StandardProcessGroup[identifier=9e102d08-0174-1000-ffff-ffffdb703545,name=ContactLookup]
 is not the most recent version of the flow that is under Version Control; 
current version is 3; most recent version is 7
2020-09-27 21:38:20,691 INFO [Timer-Driven Process Thread-6] 
o.a.nifi.groups.StandardProcessGroup 
StandardProcessGroup[identifier=4b226950-0174-1000-0000-000064a82b74,name=EcomdashOrderProcessingMain]
 is not the most recent version of the flow that is under Version Control; 
current version is 8; most recent version is 10
2020-09-27 21:38:20,694 INFO [Timer-Driven Process Thread-6] 
o.a.nifi.groups.StandardProcessGroup 
StandardProcessGroup[identifier=e366c899-0173-1000-0000-000026d80b41,name=ContactLookup]
 is not the most recent version of the flow that is under Version Control; 
current version is 5; most recent version is 7
2020-09-27 21:38:20,697 INFO [Timer-Driven Process Thread-6] 
o.a.nifi.groups.StandardProcessGroup 
StandardProcessGroup[identifier=a17c8629-0173-1000-0000-0000055a79e8,name=HandleFailedMessages]
 is not the most recent version of the flow that is under Version Control; 
current version is 2; most recent version is 3
2020-09-27 21:38:34,799 INFO [Framework Task Thread Thread-3] 
o.a.n.p.store.WriteAheadStorePartition Successfully rolled over Event Writer 
for Provenance Event Store Partition[directory=./provenance_repository] due to 
MAX_TIME_REACHED
2020-09-27 21:38:34,799 ERROR [Compress Provenance Logs-1-thread-2] 
o.a.n.p.s.EventFileCompressor Failed to compress 
./provenance_repository/1693519.prov on rollover
java.io.FileNotFoundException: ./provenance_repository/1693519.prov (No such 
file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at 
org.apache.nifi.provenance.serialization.EventFileCompressor.compress(EventFileCompressor.java:164)
        at 
org.apache.nifi.provenance.serialization.EventFileCompressor.run(EventFileCompressor.java:115)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2020-09-27 21:38:34,799 WARN [Compress Provenance Logs-1-thread-2] 
o.a.n.p.s.EventFileCompressor Failed to delete 
./provenance_repository/1693519.prov; this file should be cleaned up manually
2020-09-27 21:38:34,887 INFO [Clustering Tasks Thread-3] 
o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2020-09-27 
21:38:34,748 and sent to 10.51.8.18:9999 at 2020-09-27 21:38:34,887; send took 
139 millis
2020-09-27 21:38:39,660 INFO [pool-15-thread-1] 
o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile 
Repository
2020-09-27 21:38:39,660 INFO [pool-15-thread-1] 
o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile 
Repository with 15079 records in 0 milliseconds
2020-09-27 21:38:54,111 INFO [pool-61-thread-1] 
c.a.s.k.clientlibrary.lib.worker.Worker Current stream shard assignments: 
shardId-000000000000
2020-09-27 21:38:54,111 INFO [pool-61-thread-1] 
c.a.s.k.clientlibrary.lib.worker.Worker Sleeping ...
2020-09-27 21:38:59,661 INFO [pool-15-thread-1] 
o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile 
Repository
2020-09-27 21:38:59,661 INFO [pool-15-thread-1] 
o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile 
Repository with 15079 records in 0 milliseconds
2020-09-27 21:39:03,202 INFO [pool-43-thread-1] 
c.a.s.k.clientlibrary.lib.worker.Worker Current stream shard assignments: 
shardId-000000000012
2020-09-27 21:39:03,202 INFO [pool-43-thread-1] 
c.a.s.k.clientlibrary.lib.worker.Worker Sleeping ...
2020-09-27 21:39:19,661 INFO [pool-15-thread-1] 
o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile 
Repository
2020-09-27 21:39:19,661 INFO [pool-15-thread-1] 
o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile 
Repository with 15079 records in 0 milliseconds
2020-09-27 21:39:20,156 INFO [Write-Ahead Local State Provider Maintenance] 
org.wali.MinimalLockingWriteAheadLog 
org.wali.MinimalLockingWriteAheadLog@1fe275d8 checkpointed with 4 Records and 0 
Swap Files in 4 milliseconds (Stop-the-world time = 0 milliseconds, Clear Edit 
Logs time = 0 millis), max Transaction ID 1312
{code}

> Provenance failed to be compressed after nifi upgrade to 1.12
> -------------------------------------------------------------
>
>                 Key: NIFI-7856
>                 URL: https://issues.apache.org/jira/browse/NIFI-7856
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 1.12.0
>            Reporter: Mengze Li
>            Priority: Major
>         Attachments: ls.png, screenshot-1.png
>
>
> We upgraded our nifi cluster from 1.11.3 to 1.12.0.
> The nodes come up and everything looks to be functional. I can see 1.12.0 is 
> running.
> Later on, we discovered that the data provenance is missing. From checking 
> our logs, we see tons of errors compressing the logs.
> {code}
> 2020-09-28 03:38:35,205 ERROR [Compress Provenance Logs-1-thread-1] 
> o.a.n.p.s.EventFileCompressor Failed to compress 
> ./provenance_repository/2752821.prov on rollover
> {code}
> This didn't happen in 1.11.3. 
> Is this a known issue? We are considering reverting back if there is no 
> solution for this since we can't go prod with no/broken data provenance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to