Karthik, This is my go to article for high performance tuning and a first read for most. It still applies for newer releases of Apache NIFI. [1]
You should also consider looking at your disk queuing algorithm strategy depending on if you are on physical servers or not. If on physical and using a hardware raid we've seen increase setting the scheduler to "noop" [2] [1] https://community.hortonworks.com/content/kbentry/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html [2] http://cromwell-intl.com/linux/performance-tuning/disks.html Hope that helps. On Mon, Jul 3, 2017 at 1:04 PM Richard St. John <[email protected]> wrote: > Hi there, > > In the beginning of our NiFi adoption, we faced similar issues. For us, we > clustered NiFi, limited the number of concurrent tasks for each processor > and added more logical partitions for content and provenance repositories. > Now, we easily processor million of flow files per minute on a 5-node > cluster with hundreds of processors in the data flow pipeline. When we need > to ingest more data or process it faster, we simply add more nodes. > > First and foremost, clustering NiFi allows horizontal scaling: a must. It > seems counterintuitive, but limiting the number of concurrent tasks was a > major performance improvement. Doing so keeps the flow "balanced", > preventing hotspots within the flow pipeline. > > I hope this helps > > Rick. > > -- > Richard St. John, PhD > Asymmetrik > 141 National Business Pkwy, Suite 110 > Annapolis Junction, MD 20701 > > On Jul 3, 2017, 12:53 PM -0400, Karthik Kothareddy (karthikk) [CONT - Type > 2] <[email protected]>, wrote: > > All, > > > > I am currently using NiFi 1.2.0 on a Linux (RHEL) machine. I am using a > single instance without any clustering. My machine has ~800GB of RAM and > 2.5 TB of disk space (SSD’s with RAID5). I have set my Java heap space > values to below in “bootstrap.conf” file > > > > # JVM memory settings > > java.arg.2=-Xms40960m > > java.arg.3=-Xmx81920m > > > > # Some custom Configurations > > java.arg.7=-XX:ReservedCodeCacheSize=1024m > > java.arg.8=-XX:CodeCacheMinimumFreeSpace=10m > > java.arg.9=-XX:+UseCodeCacheFlushing > > > > Now, the problem that I am facing when I am stress testing this instance > is whenever the Read/Write of Data feeds reach the limit of 5GB (at least > that’s what I observed) the whole instance is running super slow meaning > the flowfiles are moving very slow in the queues. It is heavily affecting > the other Processor groups as well which are very simple flows. I tied to > read the system diagnostics at that point and see that all the usage is > below 20% including heap Usage, flowFile and content repository usage. I > tried to capture the status history of the Process Group at that particular > point and below are some results. > > > > > > > > > > > > > > > > > > > > From the above images it is obvious that the process group is working on > lot of IO at that point. Is there a way to increase the throughput of the > instance given my requirement which has tons of read/writes every hour. > Also to add all my repositories (flowfile , content and provenance) are on > the same disk. I tried to increase all the memory settings I possibly can > in both bootstrap.conf and nifi.properties , but no use the whole instance > is running very slow and is processing minimum amount of flowfiles. Just to > make sure I created a GenerateFlowfile processor when the system is slow > and to my surprise the rate of flow files generated is less that one per > minute (which should fill the queue in less than 5 secs under normal > circumstances). Any help on this would be much appreciated. > > > > > > Thanks > > Karthik > > > > > > > > > > > > > > > > > > > > > > >
