Hi,

I'd like to verify my observation on current State implementation is
correct, so that we could fix them if necessary and make plan for
improvement.

1. State is stored with namespace prefix which typically composes to
(component id, task id) pair and it doesn't look like having classification
for topology. Is this correct observation? If then I think that's worth to
call it as 'critical' and it must be fixed.

2. We're allowing end-users to put key of state, and also no restriction
for grouping on stateful component. I feel such flexibility breaks the
possibility to reshard state and end-users are required to implement their
own reshard tool according to their topology state key distribution logic.
I expect it will not happen on streams API (since it should be done with
keyed stream) but wouldn't it better to also restrict such flexibility also
for core API?

3. Suppose we are going to support state resharding (for allowing change of
parallelism) and we restrict to apply field grouping with key while
connecting stateful component.
Then key-value can be moved based on key (though finding and replacing task
id may not be trivial if component name has '-'... we have same issue on
metric name, so maybe time to restrict characters on topology name as well
as component name?).
Is it also true for window/partition/windowsystem state? I didn't take a
deep look on window state (I would find a time) but it would be great if
someone knowing the detail makes it clear.

Thanks in advance,
Jungtaek Lim (HeartSaVioR)

Reply via email to