I have a nifi workflow to read from multiple mysql binlogs and put it to
hdfs. I am using ChangeDataCapture as a source and PutHdfs as sink. I am
using MergeContent processor in between to chunk the messages together for
hdfs.

*[CDC (Primary node only. About 200 of this processor for each db)]
*-> *UpdateAttributes(db-table-hour)
-> MergeContent 500 msg(bin based on db-table-hour) ->MergeContent 200
msg(bin based on db-table-hour) -> Put hdfs*

I have about ~200 databases to read from, And ~2500 tables altogether.
Update rate for binlogs is around 1Mbps per database. I am planning to run
this on 3 node nifi cluster as of now.

Has anyone used mergecontent with more than 2000 bins before? Does it scale
well.? Can someone suggest me any improvements to the workflow or
alternatives.

Thanks
Ashwin

Reply via email to