Pramod, How would dynamic property change using OperatorRequest as part of StatsListener work with new approach?
Thanks - Gaurav > On Sep 28, 2015, at 10:30 AM, Pramod Immaneni <[email protected]> wrote: > > An optimization that can be done is the below steps are done only when > there only when there are more than one input operator but in case of a > single input operator case which is more common the property change tuple > can be inserted at the next possible window without having to temporarily > pause the flow. > > On Mon, Sep 28, 2015 at 10:27 AM, Timothy Farkas <[email protected]> > wrote: > >> Furthermore this approach is not limited to DAGs with a single input >> operator. In the case where a DAG has multiple input operators property >> changes can be set within the same window across all input operators by >> enforcing some synchronization at the input operator level when setting the >> property. This synchronization would look like the following: >> >> 1. When receiving a property change request, ask all input operators to >> stop and send their current window. >> 2. Take the max window + 1 (not technically correct but you get the >> idea) >> 3. Send the property change request to all the input operators and tell >> them to apply the change at the maximum window id + 1. >> 4. Resume the input operators. >> >> This ensures that the change is applied at the same window Id and also >> ensures that the change is applied at a window ID that the input operator >> had never played before. Therefore property changes will not interfere with >> the idempotence of operators. >> >> >> On Mon, Sep 28, 2015 at 9:17 AM, Pramod Immaneni <[email protected]> >> wrote: >> >>> Apex support modification of operator properties at runtime but the >>> current implemenations has the following shortcomings. >>> >>> 1. Property is not set across all partitions on the same window as >>> individual partitions can be on different windows when property change is >>> initiated from client resulting in inconsistency of data for those windows. >>> I am being generous using the word inconsistent. >>> 2. Sometimes properties need to be set on more than one logical operators >>> at the same time to achieve the change the user is seeking. Today they will >>> be two separate changes happening on two different windows again resulting >>> in inconsistent data for some windows. These would need to happen as a >>> single transaction. >>> 3. If there is an operator failure before a committed checkpoint after an >>> operator property is dynamically changed the operator will restart with the >>> old property and the change will not be re-applied. >>> >>> Tim and myself did some brainstorming and we have a proposal to overcome >>> these shortcomings. The main problem in all the above cases is that the >>> property changes are happening out-of-band of data flow and hence >>> independent of windowing. The proposal is to bring the property change >>> request into the in-band dataflow so that they are handled consistently >>> with windowing and handled distributively. >>> >>> The idea is to inject a special property change tuple containing the >>> property changes and the identification information of the operator's they >>> affect into the dataflow at the input operator. The tuple will be injected >>> at window boundary after end window and before begin window and as this >>> tuple flows through the DAG the intended operators properties will be >>> modifed. They will all be modified consistently at the same window. The >>> tuple can contain more than one property changes for more than one logical >>> operators and the change will be applied consistently to the different >>> logical operators at the same window. In case of failure the replay of >>> tuples will ensure that the property change gets reapplied at the correct >>> window. >>> >>> Please give your feedback and input on what you think about this proposal. >>> >>> Thanks >>> >> >>
