[ 
https://issues.apache.org/jira/browse/FLINK-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-15969:
----------------------------------------
    Description: 
Currently in Stateful Functions, {{PersistedValue}} and {{PersistedTable}} are 
multiplexed under a single {{MapState}}. I propose to split them up, and have 
them multiplexed with 2 separate {{MapState}}, for the following reasons:
* There's already a problem with the (to-be-introduced) state reader / 
analyzer, that to read a single function's persisted state values, you have to 
iterate through ALL keys (which includes state of other functions) since we 
multiplex everything into a single handle.
* If you multiplex both tables and values into a single state handle, this will 
because even more of a problem in the future, say when the user just wants to 
read table state and not value state.
* If we do decide to separate the handles, we can slim down the 
{{MultiplexedStateKey}} type a bit, by having a separate 
{{MultiplexedTableStateKey}} that has a {{ByteString userKey}} field and a 
{{MultiplexedStateKey prefix}} field. There's already a minor concern with the 
way we use {{MultiplexedStateKey}}: Does protobuf repeated fields require some 
extra metadata written? If yes, its a tad bit redundant size-wise in this case 
since we only ever have 1 user key added.
* When multiplexing both value states and table state under the same state 
handle, the key is essentially ambiguous - there is a possibility that a value 
state's key in {{MapState}} can be set up to overwrite another key of a table 
state.

  was:
Currently in Stateful Functions, {{PersistedValue}}s and {{PersistedTable}}s 
are multiplexed under a single {{MapState}}. I propose to split them up, and 
have them multiplexed with 2 separate {{MapState}}s, for the following reasons:
* There's already a problem with the (to-be-introduced) state reader / 
analyzer, that to read a single function's persisted state values, you have to 
iterate through ALL keys (which includes state of other functions) since we 
multiplex everything into a single handle.
* If you multiplex both tables and values into a single state handle, this will 
because even more of a problem in the future, say when the user just wants to 
read table state and not value state.
* If we do decide to separate the handles, we can slim down the 
{{MultiplexedStateKey}} type a bit, by having a separate 
{{MultiplexedTableStateKey}} that has a {{ByteString userKey}} field and a 
{{MultiplexedStateKey prefix}} field. There's already a minor concern with the 
way we use {{MultiplexedStateKey}}: Does protobuf repeated fields require some 
extra metadata written? If yes, its a tad bit redundant size-wise in this case 
since we only ever have 1 user key added.
* When multiplexing both value states and table state under the same state 
handle, the key is essentially ambiguous - there is a possibility that a value 
state's key in {{MapState}} can be set up to overwrite another key of a table 
state.


> Do not multiplex both PersistedValue and PersistedTable with a single 
> MapState state handle
> -------------------------------------------------------------------------------------------
>
>                 Key: FLINK-15969
>                 URL: https://issues.apache.org/jira/browse/FLINK-15969
>             Project: Flink
>          Issue Type: Improvement
>          Components: Stateful Functions
>    Affects Versions: statefun-1.1
>            Reporter: Tzu-Li (Gordon) Tai
>            Assignee: Tzu-Li (Gordon) Tai
>            Priority: Major
>
> Currently in Stateful Functions, {{PersistedValue}} and {{PersistedTable}} 
> are multiplexed under a single {{MapState}}. I propose to split them up, and 
> have them multiplexed with 2 separate {{MapState}}, for the following reasons:
> * There's already a problem with the (to-be-introduced) state reader / 
> analyzer, that to read a single function's persisted state values, you have 
> to iterate through ALL keys (which includes state of other functions) since 
> we multiplex everything into a single handle.
> * If you multiplex both tables and values into a single state handle, this 
> will because even more of a problem in the future, say when the user just 
> wants to read table state and not value state.
> * If we do decide to separate the handles, we can slim down the 
> {{MultiplexedStateKey}} type a bit, by having a separate 
> {{MultiplexedTableStateKey}} that has a {{ByteString userKey}} field and a 
> {{MultiplexedStateKey prefix}} field. There's already a minor concern with 
> the way we use {{MultiplexedStateKey}}: Does protobuf repeated fields require 
> some extra metadata written? If yes, its a tad bit redundant size-wise in 
> this case since we only ever have 1 user key added.
> * When multiplexing both value states and table state under the same state 
> handle, the key is essentially ambiguous - there is a possibility that a 
> value state's key in {{MapState}} can be set up to overwrite another key of a 
> table state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to