[
https://issues.apache.org/jira/browse/FLINK-10122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583549#comment-16583549
]
ASF GitHub Bot commented on FLINK-10122:
----------------------------------------
StefanRRichter commented on issue #6537: [FLINK-10122] KafkaConsumer should use
partitionable state over union state if partition discovery is not active
URL: https://github.com/apache/flink/pull/6537#issuecomment-413787110
Thanks @tzulitai ! I was aware that this will break the behavior for
partition discovery. However, the current implementation was already broken for
user at large scale, as pointed out in the description. This PR was intended as
a quick solution for this case. I think that we can have better non-breaking
solutions in the future like splitting the source into two operators or a
different state partitioning scheme. I think that we can close the PR and go
for the long term solution in official releases. Nevertheless I think that we
should cherry-pick two parts of this PR into releases, the hotfix to improve
memory utilization and the option to remove operator state (or - even better -
states in general).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> KafkaConsumer should use partitionable state over union state if partition
> discovery is not active
> --------------------------------------------------------------------------------------------------
>
> Key: FLINK-10122
> URL: https://issues.apache.org/jira/browse/FLINK-10122
> Project: Flink
> Issue Type: Improvement
> Components: Kafka Connector
> Reporter: Stefan Richter
> Assignee: Stefan Richter
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.7.0
>
>
> KafkaConsumer store its offsets state always as union state. I think this is
> only required in the case that partition discovery is active. For jobs with a
> very high parallelism, the union state can lead to prohibitively expensive
> deployments. For example, a job with 2000 source and a total of 10MB
> checkpointed union state offsets state would have to ship ~ 2000 x 10MB =
> 20GB of state. With partitionable state, it would have to ship ~10MB.
> For now, I would suggest to go back to partitionable state in case that
> partition discovery is not active. In the long run, I have some ideas for
> more efficient partitioning schemes that would also work for active discovery.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)