Github user bbende commented on the pull request:
https://github.com/apache/flink/pull/1198#issuecomment-144769010
I think the example data flow in NiFi is producing data fairly slowly, this
is based on the scheduling tab of the GenerateFlowFile processor being set to 2
secs. We could set this to 0 sec to start really going by stopping the
processor, right-clicking and selecting Configure.
The SiteToSiteClientConfig has options to set batch count but the example
isn't setting it so it is using some default, and the first instance of the
source probably runs and pulls all the available data, and when the second
instance runs nothing is left.
If you set the batch count lower in the example, like this:
SiteToSiteClientConfig clientConfig = new SiteToSiteClient.Builder()
.url("http://localhost:8080/nifi")
.portName("Data for Flink")
.requestBatchCount(5)
.buildConfig();
and then have more than 5 flow files queued in NiFi when you start the
source topology, I think the other instances will start pulling as well. This
seemed to work for me using parallelism of 2, but of course let me know if you
are seeing something different.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---