Long recovery time

Joe Gresock Sun, 08 Oct 2017 08:54:06 -0700

I have a a NiFi 1.1.0 instance whose disk nearly (but not quite) filled
up.  I noticed that some of its NiFi processors were hanging so I restarted
it, but it's taking over an hour to come back up.

My question is: how can I tell if NiFi is doing something productive (and
therefore I should just let it finish) vs. hanging (and therefore I should
try something else)? Is it possible that NiFi could take hours to stand
back up? My content_repository is 276GB and my flowfile_repository is
640GB.

I see the following in the logs:

o.a.n.controller.StandardFlowFileQueue Recovered 8 swap files for
FlowFileQueue[...] in 51 millis
org.wali.MinimalLockingWriteAheadLog finished recovering records.
Performing Checkpoint to ensure proper state of Partitions before updates
org.wali.MinimalLockingWriteAheadLog Successfully recovered 10536141
records in 38509 milliseconds

Thereafter, the only thing I see in the logs are these periodic messages:
org.wali.MinimalLockingWriteAheadLog checkpointed with 2 Records and 0 Swap
Files in 16 milliseconds, Max Transaction ID 31

I did a thread dump and see pretty standard stuff, including one that I
thought might be relevant:
"main" Id=1 RUNNABLE
at java.util.HashMap.putVal(HashMap.java:641)
at java.util.HashMap.put(HashMap.java:611)
at
org.apache.nifi.repository.schema.SchemaRecordReader.readFieldValue(SchemaRecordReader.java:154)

I took a couple dumps in a row in case it was hung here, but it appears to
be progressing to different points in the stack.

NiFi is the only thing running on this instance, and nearly all of its 48GB
of RAM are being used, and I did notice that it is doing some heavy reads
but not many writes (using iostat).

Thanks,
Joe
--
I know what it is to be in need, and I know what it is to have plenty. I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want. I can do
all this through him who gives me strength. *-Philippians 4:12-13*

Long recovery time

Reply via email to