[
https://issues.apache.org/jira/browse/FLINK-30413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yufan Sheng updated FLINK-30413:
--------------------------------
Description:
A lot of Pulsar connector test unstable issues are related to {{Shared}} and
{{Key_Shared}} subscription. Because this two subscription is designed to
consume the records in an unordered way. And we can support multiple consumers
in same topic partition. But this feature lead to some drawbacks in connector.
1. Performance
Flink is a true stream processor with high correctness support. But support
multiple consumer will require higher correctness which depends on Pulsar
transaction. But the internal implementation of Pulsar transaction on source is
record the message one by one and stores all the pending ack status in client
side. Which is slow and memory inefficient.
This means that we can only use {{Shared}} and {{Key_Shared}} on Flink with low
throughput. This against our intention to support these two subscription.
Because adding multiple consumer to same partition can increase the consuming
speed.
2. Unstable
Pulsar transaction acknowledge the messages one by one in an internal Pulsar's
topic. But it's not stable enough to get it works. A lot of pending issues in
Flink JIRA are related to Pulsar transaction and we don't have any workaround.
3. Complex
Support {{Shared}} and {{Key_Shared}} subscription make the connector's code
more complex than we expect. We have to make every part of code into ordered
and unordered way. Which is hard to understand for the maintainer.
4. Necessary
The current implementation on {{Shared}} and {{Key_Shared}} is completely
unusable to use in Production environment. For the user, this function is not
necessary. Because there is no bottleneck in consuming data from Pulsar, the
bottleneck is in processing the data, which we can achieve by increasing the
parallelism of the processing operator.
was:
A lot of Pulsar connector test unstable issues are related to {{Shared}} and
{{Key_Shared}} subscription. Because this two subscription is designed to
consume the records in an unordered way. And we can support multiple consumers
in same topic partition. But this feature lead to some drawbacks in connector.
1. Performance
Flink is a true stream processor with high correctness support. But support
multiple consumer will require higher correctness which depends on Pulsar
transaction. But the internal implementation of Pulsar transaction on source is
record the message one by one and stores all the pending ack status in client
side. Which is slow and memory inefficient.
This means that we can only use {{Shared}} and {{Key_Shared}} on Flink with low
throughput. This against our intention to support these two subscription.
Because adding multiple consumer to same partition can increase the consuming
speed.
2. Unstable
Pulsar transaction acknowledge the messages one by one in an internal Pulsar's
topic
> Drop Share and Key_Shared subscription support in Pulsar connector
> ------------------------------------------------------------------
>
> Key: FLINK-30413
> URL: https://issues.apache.org/jira/browse/FLINK-30413
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / Pulsar
> Affects Versions: 1.17.0
> Reporter: Yufan Sheng
> Priority: Critical
> Fix For: pulsar-4.0.0
>
>
> A lot of Pulsar connector test unstable issues are related to {{Shared}} and
> {{Key_Shared}} subscription. Because this two subscription is designed to
> consume the records in an unordered way. And we can support multiple
> consumers in same topic partition. But this feature lead to some drawbacks in
> connector.
> 1. Performance
> Flink is a true stream processor with high correctness support. But support
> multiple consumer will require higher correctness which depends on Pulsar
> transaction. But the internal implementation of Pulsar transaction on source
> is record the message one by one and stores all the pending ack status in
> client side. Which is slow and memory inefficient.
> This means that we can only use {{Shared}} and {{Key_Shared}} on Flink with
> low throughput. This against our intention to support these two subscription.
> Because adding multiple consumer to same partition can increase the consuming
> speed.
> 2. Unstable
> Pulsar transaction acknowledge the messages one by one in an internal
> Pulsar's topic. But it's not stable enough to get it works. A lot of pending
> issues in Flink JIRA are related to Pulsar transaction and we don't have any
> workaround.
> 3. Complex
> Support {{Shared}} and {{Key_Shared}} subscription make the connector's code
> more complex than we expect. We have to make every part of code into ordered
> and unordered way. Which is hard to understand for the maintainer.
> 4. Necessary
> The current implementation on {{Shared}} and {{Key_Shared}} is completely
> unusable to use in Production environment. For the user, this function is not
> necessary. Because there is no bottleneck in consuming data from Pulsar, the
> bottleneck is in processing the data, which we can achieve by increasing the
> parallelism of the processing operator.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)