Hi Jungtaek,

1. Right now users need to vary the database/table name in the state provider 
config per topology. Agree, its better to include topology in the namespace.

2. IMO, users should have the flexibility to store multiple key-values in the 
state in the core API. As of now storm only supports rebalancing the tasks 
(executor count remains the same). So users need not worry about re-sharding 
their state as long as they use the right grouping. I think this flexibility 
also helps us to provide useful abstractions on top.

3. Re-sharding would be based on the keys (re-hashing). The component id, 
task-id would be used to map to the namespace. So I am not clear about the 
concern you have raised.

We could support dynamic state re-sharding via underlying higher level 
abstractions (e.g. Streams API) where we control the namespaces, keys etc and 
would be more manageable. IMO before we do this, we can introduce stateful 
exactly once processing via the streams API as the first step.

Thanks,
Arun



On 2/19/18, 4:59 PM, "Jungtaek Lim" <[email protected]> wrote:

>Hi,
>
>I'd like to verify my observation on current State implementation is
>correct, so that we could fix them if necessary and make plan for
>improvement.
>
>1. State is stored with namespace prefix which typically composes to
>(component id, task id) pair and it doesn't look like having classification
>for topology. Is this correct observation? If then I think that's worth to
>call it as 'critical' and it must be fixed.
>
>2. We're allowing end-users to put key of state, and also no restriction
>for grouping on stateful component. I feel such flexibility breaks the
>possibility to reshard state and end-users are required to implement their
>own reshard tool according to their topology state key distribution logic.
>I expect it will not happen on streams API (since it should be done with
>keyed stream) but wouldn't it better to also restrict such flexibility also
>for core API?
>
>3. Suppose we are going to support state resharding (for allowing change of
>parallelism) and we restrict to apply field grouping with key while
>connecting stateful component.
>Then key-value can be moved based on key (though finding and replacing task
>id may not be trivial if component name has '-'... we have same issue on
>metric name, so maybe time to restrict characters on topology name as well
>as component name?).
>Is it also true for window/partition/windowsystem state? I didn't take a
>deep look on window state (I would find a time) but it would be great if
>someone knowing the detail makes it clear.
>
>Thanks in advance,
>Jungtaek Lim (HeartSaVioR)

Reply via email to