Can we get cadmium rods?

Our down-streamers are complaining, and I have some experience through testing myself, of NiFi getting into molasses where the only solution is to bounce it. Here are some comments from my users that I hope are helpful.

   "I'm getting really burned out over the issue of when NiFi
   processors get stuck, you can't get them to stop and the only
   solution is |# systemctl restart ps-nifi|. I actually keep a window
   open in tmux where I run this command all the time so I can just go
   to that window and press up enter to restart it again."

   "I have a /DistributeLoad/ processor that was sitting there doing
   nothing at all even though it said it was running. I tried
   refreshing for a while, and after several minutes I finally tried
   stopping the processor to see if stopping and starting it again
   would help.

   "So I told it to stop, then suddenly NiFi refreshed (even though it
   had been refusing to refresh for several minutes. Seems like it does
   whatever it wants, when it feels like it). Then it turned out that
   that processor actually HAD been running, I just couldn't see it.
   Now I want to start it again, but I can't, because it has a couple
   of stuck threads. So, I resort to |# systemctl restart ps-nifi|. I
   know the purpose of this UI is to give us visibility into the ETL
   process, but if it only gives us visibility when it feels like it,
   and then it only stops a process if it feels like it, its really
   annoying."

(Of course, some of this is "point of view" and a lack of understanding what's really going on.)

What we do is ingest millions of medical documents including plain-text transcripts, HL 7 pipe messages, X12 messages and CDAs (CCDs and CCDAs). These are analyzed for all sorts of important data, transformed into an intermediate format before being committed to a search engine and database for retrieval.

We've written many dozen custom processors and use many of those that come with NiFi to perform this ETL over the last year or so, most very small, and are very happy with the visibility NiFi gives us into what used to be a pretty opaque and hard-to-understand ETL component. Our custom processors range from some very specific one doing document analysis and involving regular expressions to more general ones that do HL7, XML, X12, etc. parsing, to invoking Tika and cTAKES. This all works very well in theory, but as you can see, there's considerable trouble and we're having a difficult time tuning, using careful back-pressure, etc.

What we think we need, and we're eager for opinions here, is for NiFi to dedicate a thread to the UI such that bouncing NiFi is no longer the only option. We want to reach it and shut things down without the UI being held hostage to threads burdened or hung with tasks that are far from getting back to it. I image being able to right-click a process group and stop it like shoving cadmium rods into a radioactive pile to scram NiFi, examine what's going on, find and tune the parts in our flow that we had not before understood were problematic. (Of course, what I've just said probably betrays a lack of understanding on my part too.)

Also, in my observation, when the quantity of files and subdirectories under /content_repository/ gets too big, it seems to me that the only thing I can do is to smoke them all before starting NiFi back up.

I've been running the Java Flight Recorder attempting to spy on our NiFi flows remotely using Java Mission Control. This isn't easily done either because of how JFR works and my spyglass goes dark just as our users lose UI response.

Thoughts?

Russ

Reply via email to