Re: Processor running slow in production, not locally

Joe Witt Thu, 13 Oct 2016 16:18:51 -0700

Thanks for following up Russ and glad you're making progress.

On Oct 13, 2016 5:36 PM, "Russell Bateman" <
[email protected]> wrote:


> As promised, and as embarrassing as it seems now, I'm reporting what
> happened...
>
> It appears that one of our IT guys failed to type /G/ when he created the
> swap partition on this staging server and it ended up sized at 128M instead
> of 128G. (Fortunately, it's not a production server and I think we have
> safeties in place to guard against similaring screwing up those
> installations.)
>
> This turned up visibly using htop. Unfortunately, though htopwas an early
> tool in our quest for what wasn't right, we were thinking it was something
> in NiFi, one of our processors in the flow, etc., concentrating on that
> angle, and we weren't looking at the top section output of htopuntil after
> poring through NiFi logs and eliminating all other suspicions.
>
> Live and learn.
>
> Russ
>
> On 10/05/2016 06:21 PM, Andrew Grande wrote:
>
>> Just a sanity check, number of open file handles increased as per
>> quickstart document? Might need much more for your flow.
>>
>> Another tip, when your server experiences undesired hiccups like that try
>> running 'nifi.sh dump save-in-this-file.txt' and investigate/share where
>> NiFi threads are being held back.
>>
>> Andrew
>>
>> On Tue, Oct 4, 2016, 10:54 AM Russell Bateman <
>> [email protected]> wrote:
>>
>> We use the templating to create FHIR XML, in this case, a
>>>
>>>      <Binary>
>>>         ...
>>>         <content value="$flowfile_contents" />
>>>      </Binary>
>>>
>>> construct that includes a base-64 encoding of a PDF, the flowfile
>>> contents coming into the templating processor. These can get to be
>>> megabytes in size though our sample data was just under 1Mb.
>>>
>>> Yesterday, I built a new, reduced flow restricting the use of my
>>> /VelocityTemplating/ processor to perform only the part of that task
>>> that I suspected would be taking so much time, that is, copying the
>>> base-64 data into the template in place of the VTL macro. However, I
>>> could not reproduce the problem though I did this on the very production
>>> server (actually, more of a staging server, but it was the very server
>>> where the trouble was detected in the first place).
>>>
>>> Predictably (that is if, like me, you believe Murphy reigns supreme in
>>> this universe), the action using the very files in question took
>>> virtually no time at all, just as had been my experience running on my
>>> local development host. I then slightly expanded the new flow to take in
>>> some of the other trappings of the original one (but, it was the
>>> templating that was reported as being the bottleneck--minutes to fill
>>> out the template instead of milliseconds). In short, I could not
>>> replicate the problem. True, the moon is in a different phase than late
>>> last week when this was reported.
>>>
>>> I will come back here and report if and when we stumble upon this, it
>>> reoccurs and/or we took a decision about anything, for the benefit of
>>> the community. At present, we're looking to force re-ingestion of the
>>> run, using the original flow design, including the documents that
>>> reportedly experienced this trouble to see if it happens yet again.
>>>
>>> In the meantime, I can say:
>>>
>>>      - I keep no state in this processor (indeed, I try not to and don't
>>>      think I have anything stateful in any of our custom processors).
>>>      - The server runs some 40 cores, 128Gb RAM on 12Tb of disk,
>>>      dedicated hardware, CentOS 7, recently built and installed.
>>>      - Reportedly, I learned, little else was going on on the server at
>>>      the same time, either in NiFi or elsewhere.
>>>      - NiFi heap is configured to be 12Gb.
>>>      - Not so far along yet as to understand thread usage or garbage
>>>      collection state.
>>>
>>> Again, thanks for the suggestions from both of you.
>>>
>>> Russ
>>>
>>>
>>> On 10/03/2016 06:28 PM, Joe Witt wrote:
>>>
>>>> Russ,
>>>>
>>>> As Jeff points out lack of available threads could be a factor flow
>>>> slower processing times but this would manifest itself by you seeing
>>>> that the processor isn't running very often.  If it is that the
>>>> process itself when executing takes much longer than on the other box
>>>> then it is probably best to look at some other culprits.  To check
>>>> this out you can view the status history and look at the average
>>>> number of tasks and average task time for this process.  Does it look
>>>> right to you in terms of how often it runs, how long it takes, and is
>>>> the amount of time it takes growing?
>>>>
>>>> If you find that performance of this processor itself is slowing then
>>>> consider a few things.
>>>> 1) Does it maintain some internal state and if so is the data
>>>> structure it is using efficient for lookups?
>>>> 2) How does your heap look?  Is there a lot of garbage collection
>>>> activity?  Are there any full garbage collections and if so how often?
>>>>    It should generally be the case in a well configured and designed
>>>> system that full garbage collections never occur (ever).
>>>> 3) Attaching a remote debugger and/or running profilers on it can be
>>>> really illuminating.
>>>>
>>>> JOe
>>>>
>>>> On Mon, Oct 3, 2016 at 11:26 AM, Jeff <[email protected]> wrote:
>>>>
>>>>> Russel,
>>>>>
>>>>> This sounds like it's an environmental issue.  Are you able to see the
>>>>>
>>>> heap
>>>
>>>> usage on the production machine?  Are there enough available threads to
>>>>>
>>>> get
>>>
>>>> the throughput you are observing when you run locally?  Have you
>>>>> double-checked the scheduling tab on the processor config to make sure
>>>>>
>>>> it
>>>
>>>> is running as aggressively as it runs locally?
>>>>>
>>>>> I have run into this sort of thing before, and it was because of
>>>>>
>>>> flowfile
>>>
>>>> congestion in other areas of the flow, and there were no threads
>>>>>
>>>> available
>>>
>>>> for other processors to get through their own queues.
>>>>>
>>>>> Just trying to think through some of the obvious/high level things that
>>>>> might be affecting your flow...
>>>>>
>>>>> - Jeff
>>>>>
>>>>> On Mon, Oct 3, 2016 at 9:43 AM Russell Bateman <
>>>>> [email protected]> wrote:
>>>>>
>>>>> We use NiFi for an ETL feed. On one of the lines, we use a custom
>>>>>> processor, *VelocityTemplating* (calls Apache Velocity), which works
>>>>>>
>>>>> very
>>>
>>>> well and indeed is imperceptibly fast when run locally on the same data
>>>>>> (template, VTL macros, substitution fodder). However, in production
>>>>>>
>>>>> it's
>>>
>>>> another matter. What takes no time at all in local runs takes minutes
>>>>>>
>>>>> in
>>>
>>>> that environment.
>>>>>>
>>>>>> I'm looking for suggestions as to a) why this might be and b) how best
>>>>>>
>>>>> to
>>>
>>>> go about examining/debugging it. I think I will soon have
>>>>>>
>>>>> remote-access to
>>>
>>>> the production machine (a VPN must be set up).
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Russ
>>>>>>
>>>>>>
>>>
>

Re: Processor running slow in production, not locally

Reply via email to