Can we get cadmium rods?
Our down-streamers are complaining, and I have some experience through
testing myself, of NiFi getting into molasses where the only solution is
to bounce it. Here are some comments from my users that I hope are helpful.
"I'm getting really burned out over the issue of when NiFi
processors get stuck, you can't get them to stop and the only
solution is |# systemctl restart ps-nifi|. I actually keep a window
open in tmux where I run this command all the time so I can just go
to that window and press up enter to restart it again."
"I have a /DistributeLoad/ processor that was sitting there doing
nothing at all even though it said it was running. I tried
refreshing for a while, and after several minutes I finally tried
stopping the processor to see if stopping and starting it again
would help.
"So I told it to stop, then suddenly NiFi refreshed (even though it
had been refusing to refresh for several minutes. Seems like it does
whatever it wants, when it feels like it). Then it turned out that
that processor actually HAD been running, I just couldn't see it.
Now I want to start it again, but I can't, because it has a couple
of stuck threads. So, I resort to |# systemctl restart ps-nifi|. I
know the purpose of this UI is to give us visibility into the ETL
process, but if it only gives us visibility when it feels like it,
and then it only stops a process if it feels like it, its really
annoying."
(Of course, some of this is "point of view" and a lack of understanding
what's really going on.)
What we do is ingest millions of medical documents including plain-text
transcripts, HL 7 pipe messages, X12 messages and CDAs (CCDs and CCDAs).
These are analyzed for all sorts of important data, transformed into an
intermediate format before being committed to a search engine and
database for retrieval.
We've written many dozen custom processors and use many of those that
come with NiFi to perform this ETL over the last year or so, most very
small, and are very happy with the visibility NiFi gives us into what
used to be a pretty opaque and hard-to-understand ETL component. Our
custom processors range from some very specific one doing document
analysis and involving regular expressions to more general ones that do
HL7, XML, X12, etc. parsing, to invoking Tika and cTAKES. This all works
very well in theory, but as you can see, there's considerable trouble
and we're having a difficult time tuning, using careful back-pressure, etc.
What we think we need, and we're eager for opinions here, is for NiFi to
dedicate a thread to the UI such that bouncing NiFi is no longer the
only option. We want to reach it and shut things down without the UI
being held hostage to threads burdened or hung with tasks that are far
from getting back to it. I image being able to right-click a process
group and stop it like shoving cadmium rods into a radioactive pile to
scram NiFi, examine what's going on, find and tune the parts in our flow
that we had not before understood were problematic. (Of course, what
I've just said probably betrays a lack of understanding on my part too.)
Also, in my observation, when the quantity of files and subdirectories
under /content_repository/ gets too big, it seems to me that the only
thing I can do is to smoke them all before starting NiFi back up.
I've been running the Java Flight Recorder attempting to spy on our NiFi
flows remotely using Java Mission Control. This isn't easily done either
because of how JFR works and my spyglass goes dark just as our users
lose UI response.
Thoughts?
Russ