Github user fmthoma commented on a diff in the pull request:
https://github.com/apache/flink/pull/6021#discussion_r190154347
--- Diff:
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java
---
@@ -326,6 +342,24 @@ private void checkAndPropagateAsyncError() throws
Exception {
}
}
+ /**
+ * If the internal queue of the {@link KinesisProducer} gets too long,
+ * flush some of the records until we are below the limit again.
+ * We don't want to flush _all_ records at this point since that would
+ * break record aggregation.
+ */
+ private void checkQueueLimit() {
+ while (producer.getOutstandingRecordsCount() >= queueLimit) {
+ producer.flush();
--- End diff --
@tzulitai @bowenli86 I've given this some more thought. `wait()`/`notify()`
requires a `synchronized` block. So if we just notify some lock in the
callback, this would lead to synchronization overhead. We'd have to recognize a
transition from »queue size > queue limit« to »queue size <= queue limit«
and only synchronize then, which adds a lot of complexity.
On the other hand: Kinesis accepts up to 1MB per second per shard. The
queue limit should be chosen so that some data can be accumulated still before
sending, i.e. more than a second of data (more than 1MB per shard). If the
queue limit is chosen adequately, then the `Thread.sleep(500)` does not harm,
as the queued records take more than one second to flush anyway. If the queue
limit is chosen too low, then sleeping half a second may be too long, but we
would not reach maximum throughput anyway because of the limitation on the
number of `Put` requests.
I think it's not worth the additional complexity.
---