Karthik,

This is my go to article for high performance tuning and a first read for
most.
It still applies for newer releases of Apache NIFI. [1]


You should also consider looking at your disk queuing algorithm strategy
depending on if you are on physical servers or not.
If on physical and using a hardware raid we've seen increase setting the
scheduler to "noop" [2]

[1]
https://community.hortonworks.com/content/kbentry/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html
[2] http://cromwell-intl.com/linux/performance-tuning/disks.html


Hope that helps.

On Mon, Jul 3, 2017 at 1:04 PM Richard St. John <[email protected]> wrote:

> Hi there,
>
> In the beginning of our NiFi adoption, we faced similar issues. For us, we
> clustered NiFi, limited the number of concurrent tasks for each processor
> and added more logical partitions for content and provenance repositories.
> Now, we easily processor million of flow files per minute on a 5-node
> cluster with hundreds of processors in the data flow pipeline. When we need
> to ingest more data or process it faster, we simply add more nodes.
>
> First and foremost, clustering NiFi allows horizontal scaling: a must. It
> seems counterintuitive, but limiting the number of concurrent tasks was a
> major performance improvement. Doing so keeps the flow "balanced",
> preventing hotspots within the flow pipeline.
>
> I hope this helps
>
> Rick.
>
> --
> Richard St. John, PhD
> Asymmetrik
> 141 National Business Pkwy, Suite 110
> Annapolis Junction, MD 20701
>
> On Jul 3, 2017, 12:53 PM -0400, Karthik Kothareddy (karthikk) [CONT - Type
> 2] <[email protected]>, wrote:
> > All,
> >
> > I am currently using NiFi 1.2.0 on a Linux (RHEL) machine. I am using a
> single instance without any clustering. My machine has ~800GB of RAM and
> 2.5 TB of disk space (SSD’s with RAID5). I have set my Java heap space
> values to below in “bootstrap.conf” file
> >
> > # JVM memory settings
> > java.arg.2=-Xms40960m
> > java.arg.3=-Xmx81920m
> >
> > # Some custom Configurations
> > java.arg.7=-XX:ReservedCodeCacheSize=1024m
> > java.arg.8=-XX:CodeCacheMinimumFreeSpace=10m
> > java.arg.9=-XX:+UseCodeCacheFlushing
> >
> > Now, the problem that I am facing when I am stress testing this instance
> is whenever the Read/Write of Data feeds reach the limit of 5GB (at least
> that’s what I observed) the whole instance is running super slow meaning
> the flowfiles are moving very slow in the queues. It is heavily affecting
> the other Processor groups as well which are very simple flows. I tied to
> read the system diagnostics at that point and see that all the usage is
> below 20% including heap Usage, flowFile and content repository usage. I
> tried to capture the status history of the Process Group at that particular
> point and below are some results.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > From the above images it is obvious that the process group is working on
> lot of IO at that point. Is there a way to increase the throughput of the
> instance given my requirement which has tons of read/writes every hour.
> Also to add all my repositories (flowfile , content and provenance) are on
> the same disk. I tried to increase all the memory settings I possibly can
> in both bootstrap.conf and nifi.properties , but no use the whole instance
> is running very slow and is processing minimum amount of flowfiles. Just to
> make sure I created a GenerateFlowfile processor when the system is slow
> and to my surprise the rate of flow files generated is less that one per
> minute (which should fill the queue in less than 5 secs under normal
> circumstances). Any help on this would be much appreciated.
> >
> >
> > Thanks
> > Karthik
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>

Reply via email to