[
https://issues.apache.org/jira/browse/FLINK-27559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534103#comment-17534103
]
Yun Tang commented on FLINK-27559:
----------------------------------
[~Underwood] Flink's JIRA is not a place to ask questions, please ask these
questions in the mailing list:
https://flink.apache.org/community.html#mailing-lists
> Some question about flink operator state
> ----------------------------------------
>
> Key: FLINK-27559
> URL: https://issues.apache.org/jira/browse/FLINK-27559
> Project: Flink
> Issue Type: New Feature
> Environment: Flink 1.14.4
> Reporter: Underwood
> Priority: Major
>
> I hope to get two answers to Flink's maintenance status:
>
> 1. Does custompartition support saving status? In my usage scenario, the
> partition strategy is dynamically adjusted, which depends on the data in
> datastream. I hope to make different partition strategies according to
> different data conditions.
>
> For a simple example, I want the first 100 pieces of data in datastream to be
> range partitioned and the rest of the data to be hash partitioned. At this
> time, I may need a count to identify the number of pieces of data that have
> been processed. However, in custompartition, this is only a local variable,
> so there seem to be two problems: declaring variables in this way can only be
> used in single concurrency, and it seems that they cannot be counted across
> slots; In this way, the count data will be lost during fault recovery.
>
> Although Flink already has operator state and key value state,
> custompartition is not an operator, so I don't think it can solve this
> problem through state. I've considered introducing a zookeeper to save the
> state, but the introduction of new components will make the system bloated. I
> don't know whether there is a better way to solve this problem.
>
> 2. How to make multiple operators share the same state, and even all parallel
> subtasks of different operators share the same state?
>
> For a simple example, my stream processing is divided into four stages:
> source - > mapa - > mapb - > sink. I hope to have a status count to count the
> total amount of data processed by all operators. For example, if the source
> receives one piece of data, then count + 1 when mapa is processed and count +
> 1 when mapb is processed. Finally, after this piece of data is processed, the
> value of count is 2.
>
> I don't know if there is such a state saving mechanism in Flink, which can
> meet my scenario and recover from failure at the same time. At present, we
> can still think of using zookeeper. I don't know if there is a better way.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)