MartijnVisser commented on a change in pull request #18781:
URL: https://github.com/apache/flink/pull/18781#discussion_r808836654
##########
File path: docs/content/docs/connectors/datastream/pulsar.md
##########
@@ -417,4 +417,16 @@ If you have a problem with Pulsar when using Flink, keep
in mind that Flink only
and your problem might be independent of Flink and sometimes can be solved by
upgrading Pulsar brokers,
reconfiguring Pulsar brokers or reconfiguring Pulsar connector in Flink.
+### Source stop reading records for about 10s when data volume is small
Review comment:
Will you also add this section to the Chinese documentation?
##########
File path: docs/content/docs/connectors/datastream/pulsar.md
##########
@@ -417,4 +417,16 @@ If you have a problem with Pulsar when using Flink, keep
in mind that Flink only
and your problem might be independent of Flink and sometimes can be solved by
upgrading Pulsar brokers,
reconfiguring Pulsar brokers or reconfiguring Pulsar connector in Flink.
+### Source stop reading records for about 10s when data volume is small
Review comment:
```suggestion
### Messages can be delayed on low volume topics
```
##########
File path: docs/content/docs/connectors/datastream/pulsar.md
##########
@@ -417,4 +417,16 @@ If you have a problem with Pulsar when using Flink, keep
in mind that Flink only
and your problem might be independent of Flink and sometimes can be solved by
upgrading Pulsar brokers,
reconfiguring Pulsar brokers or reconfiguring Pulsar connector in Flink.
+### Source stop reading records for about 10s when data volume is small
+
+When source connector read from a low volume topic, users might observe a 10s
interval between
+messages. Pulsar source by default buffers messages from pulsar topic before
emitting to downstream
+operators until either buffered records has reached
`PulsarSourceOptions.PULSAR_MAX_FETCH_RECORDS`
+or waiting time has reached `PulsarSourceOptions.PULSAR_MAX_FETCH_TIME`
(whichever comes first).
+When data volumes is small, e.g, 4 records per second, source will wait
`PULSAR_MAX_FETCH_TIME`
+(default to 10s) before emit the records. Change either of the 2 options if
you want to avoid this
+behaviour.
Review comment:
```suggestion
When the Pulsar source connector reads from a low volume topic, users might
observe a 10 seconds delay between messages. Pulsar buffers messages from
topics by default. Before emitting to downstream
operators, the number of buffered records must be equal or larger than
`PulsarSourceOptions.PULSAR_MAX_FETCH_RECORDS`. If the data volume is low, it
could be that filling up the number of buffered records takes longer than
`PULSAR_MAX_FETCH_TIME` (default to 10 seconds). If that's the case, it means
that only after this time has passed the messages will be emitted.
To avoid this behaviour, you need to change either the buffered records or
the waiting time.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]