#2 could be accomplished with a convenience composite, yes? On Wed, Apr 25, 2018, 18:28 Xinyu Liu <[email protected]> wrote:
> @Robert: for your questions: > > 1) Side input won't work for us since it returns the whole collection. We > use rocksDb and usually the state is too big to fit in memory. > > 2) One way to achieve our use cases is to assign a single key to all the > elements so they will be associated with the same keyed state. The state > will belong to the element window as it is. Kenneth mentioned this solution > too. It does meet our use case, but it's not very convenient to our users. > > 3) Sorry if I wasn't clear about the use case. For our usage, it's pretty > common to store the elements in the states, and look them up later and do > some computation. The elements will be in the same window, but doesn't need > to be of the same key. > > Thanks, > Xinyu > > On Wed, Apr 25, 2018 at 6:02 PM, Robert Bradshaw <[email protected]> > wrote: > >> On Wed, Apr 25, 2018 at 5:45 PM Xinyu Liu <[email protected]> wrote: >> >> > Hi, >> >> > I am working on adding the stateful ParDo to the upcoming BEAM Samza >> runner, and realized that the state for each ParDo processElement() is not >> only associated with the window of the element, but also the key of the >> element. Chatted with Kenneth over email about this design decision, which >> has the following benefits for keyed state: >> >> > 1) No synchronization >> > 2) Simple programming model >> > 3) No communication between works >> >> > The current design doesn't support accessing the state across different >> keys, which seems to be a more general use case. This use case is also >> very >> common inside LinkedIn where the users have access to the entire state of >> an operator/task, and performing lookups and computations on top of it. >> It's quite hard to make every user here aware that the state is also >> tightly associated with key of the element.. >> >> Would side inputs be applicable here? (They're read-only, but other than >> that seem to fit the need.) >> >> > From the stateful ParDo API the state looks pretty general too. I am >> wondering is it possible to extend the current API to support both keyed >> and non-keyed state? Even internally BEAM assigns a dummy key for to >> associate the state with all the elements. It will be very beneficial to >> existing Samza users and help them adopt BEAM. >> >> Could you clarify how you would use this dummy key? You could manually add >> a random key, but in that case it's unlikely that any state stored would >> get observed again. Across what scope were you thinking state would be >> stored? The lifetime of the bundle? The worker? The job? How are >> conflicting writes resolved? >> >> Perhaps rather than describing the mechanism (state) that you're trying to >> use, it'd be helpful to describe the kinds of computations you're trying >> to >> perform, to figure out how the model should be adapted/extended if it >> doesn't meet those needs. >> > >
