Scaling source processors in nifi horizontally.

ashwin konale Wed, 17 Oct 2018 11:31:45 -0700

Hi,

I am experimenting with nifi for one of our usecases with plans of
extending it to various other data routing, ingestion usecases. Right now I
need to ingest data from mysql binlogs to hdfs/GCS. We have around 250
different schemas and about 3000 tables to read data from. Volume of the
data flow ranges from 500 - 2000 messages per second in different schemas.


Right now the problem is mysqlCDC processor can run in only one thread. To
overcome this issue I have two options.

1. Use primary node execution, so different processors for each of the
schemas. So eventually all processors which reads from mysql will run in
single node, which will be a bottleneck no matter how big my nifi cluster
is.

2. Another approach is to use multiple nifi instances to pull data and have
master nifi cluster for ingestion to various sinks. In this approach I will
have to manage all these small nifi instances, and may have to build some
kind of tooling on top of it to monitor/provision new processor for newly
added schemas etc.

Is there any better way to achieve my usecase with nifi ? Please advice me
on the architechture.

Looking forward for suggestion.

- Ashwin

Scaling source processors in nifi horizontally.

Reply via email to