Github user PramodSSImmaneni commented on a diff in the pull request:
https://github.com/apache/incubator-apex-malhar/pull/132#discussion_r48581219
--- Diff:
contrib/src/main/java/com/datatorrent/contrib/kafka/AbstractKafkaInputOperator.java
---
@@ -392,9 +412,20 @@ public void emitTuples()
if (maxTuplesPerWindow > 0) {
count = Math.min(count, maxTuplesPerWindow - emitCount);
}
- for (int i = 0; i < count; i++) {
+ if (count > 0) {
+ // If the total size transmitted in the window will be exceeded
don't transmit anymore messages in this window
+ // Make an exception for the case when no message has been
transmitted in the window and transmit at least one
+ // message even if the message size is greater than max message size
so that the processing doesn't get stuck
+ if ((emitCount > 0) && ((maxTotalMsgSizePerWindow -
emitTotalMsgSize) < consumer.peekMessage().msg.size())) {
+ return;
+ }
+ }
+ int numEmitted = 0;
+ for (int i = 0; i < count; i++) {
KafkaConsumer.KafkaMessage message = consumer.pollMessage();
emitTuple(message.msg);
+ ++numEmitted;
--- End diff --
Oh ok. I actually gave it a second thought and think it is better to poll
the message and keep it locally to avoid two locked reads from the queue and
associated performance impact. Anyway in the other case there will exist a
scenario where the message is out of the queue in emitTuples and new messages
get added to the holdingBuffer and it is full resulting in one extra message
than limit being outstanding. Made those changes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---