Hi, I am experimenting with nifi for one of our usecases with plans of extending it to various other data routing, ingestion usecases. Right now I need to ingest data from mysql binlogs to hdfs/GCS. We have around 250 different schemas and about 3000 tables to read data from. Volume of the data flow ranges from 500 - 2000 messages per second in different schemas.
Right now the problem is mysqlCDC processor can run in only one thread. To overcome this issue I have two options. 1. Use primary node execution, so different processors for each of the schemas. So eventually all processors which reads from mysql will run in single node, which will be a bottleneck no matter how big my nifi cluster is. 2. Another approach is to use multiple nifi instances to pull data and have master nifi cluster for ingestion to various sinks. In this approach I will have to manage all these small nifi instances, and may have to build some kind of tooling on top of it to monitor/provision new processor for newly added schemas etc. Is there any better way to achieve my usecase with nifi ? Please advice me on the architechture. Looking forward for suggestion. - Ashwin
