Re: SplittableDoFn

Alex Van Boxel Tue, 02 Oct 2018 14:19:39 -0700

Don't want to crash the tech discussion here, but... I just gave a session
at the Beam Summit about Splittable DoFn's as a users perspective (from
things I could gather from the documentation and experimentation). Her is
the slides deck, maybe it could be useful:
https://docs.google.com/presentation/d/1dSc6oKh5pZItQPB_QiUyEoLT2TebMnj-pmdGipkVFPk/edit?usp=sharing
(quite
proud of the animations though ;-)


 _/
_/ Alex Van Boxel


On Thu, Sep 27, 2018 at 12:04 AM Lukasz Cwik <[email protected]> wrote:

> Reuven, just inside the restriction tracker itself which is scoped per
> executing SplittableDoFn. A user could incorrectly write the
> synchronization since they are currently responsible for writing it though.
>
> On Wed, Sep 26, 2018 at 2:51 PM Reuven Lax <[email protected]> wrote:
>
>> is synchronization over an entire work item, or just inside restriction
>> tracker? my concern is that some runners (especially streaming runners)
>> might have hundreds or thousands of parallel work items being processed for
>> the same SDF (for different keys), and I'm afraid of creating
>> lock-contention bottlenecks.
>>
>> On Fri, Sep 21, 2018 at 3:42 PM Lukasz Cwik <[email protected]> wrote:
>>
>>> The synchronization is related to Java thread safety since there is
>>> likely to be concurrent access needed to a restriction tracker to properly
>>> handle accessing the backlog and splitting concurrently from when the users
>>> DoFn is executing and updating the restriction tracker. This is similar to
>>> the Java thread safety needed in BoundedSource and UnboundedSource for
>>> fraction consumed, backlog bytes, and splitting.
>>>
>>> On Fri, Sep 21, 2018 at 2:38 PM Reuven Lax <[email protected]> wrote:
>>>
>>>> Can you give details on what the synchronization is per? Is it per key,
>>>> or global to each worker?
>>>>
>>>> On Fri, Sep 21, 2018 at 2:10 PM Lukasz Cwik <[email protected]> wrote:
>>>>
>>>>> As I was looking at the SplittableDoFn API while working towards
>>>>> making a proposal for how the backlog/splitting API could look, I found
>>>>> some sharp edges that could be improved.
>>>>>
>>>>> I noticed that:
>>>>> 1) We require users to write thread safe code, this is something that
>>>>> we haven't asked of users when writing a DoFn.
>>>>> 2) We "internal" methods within the RestrictionTracker that are not
>>>>> meant to be used by the runner.
>>>>>
>>>>> I can fix these issues by giving the user a forwarding restriction
>>>>> tracker[1] that provides an appropriate level of synchronization as needed
>>>>> and also provides the necessary observation hooks to see when a claim
>>>>> failed or succeeded.
>>>>>
>>>>> This requires a change to our experimental API since we need to pass
>>>>> a RestrictionTracker to the @ProcessElement method instead of a sub-type 
>>>>> of
>>>>> RestrictionTracker.
>>>>> @ProcessElement
>>>>> processElement(ProcessContext context, OffsetRangeTracker tracker) {
>>>>> ... }
>>>>> becomes:
>>>>> @ProcessElement
>>>>> processElement(ProcessContext context, RestrictionTracker<OffsetRange,
>>>>> Long> tracker) { ... }
>>>>>
>>>>> This provides an additional benefit that it prevents users from
>>>>> working around the RestrictionTracker APIs and potentially making
>>>>> underlying changes to the tracker outside of the tryClaim call.
>>>>>
>>>>> Full implementation is available within this PR[2] and was wondering
>>>>> what people thought.
>>>>>
>>>>> 1:
>>>>> https://github.com/apache/beam/pull/6467/files#diff-ed95abb6bc30a9ed07faef5c3fea93f0R72
>>>>> 2: https://github.com/apache/beam/pull/6467
>>>>>
>>>>>
>>>>> On Mon, Sep 17, 2018 at 12:45 PM Lukasz Cwik <[email protected]> wrote:
>>>>>
>>>>>> The changes to the API have not been proposed yet. So far it has all
>>>>>> been about what is the representation and why.
>>>>>>
>>>>>> For splitting, the current idea has been about using the backlog as a
>>>>>> way of telling the SplittableDoFn where to split, so it would be in terms
>>>>>> of whatever the SDK decided to report.
>>>>>> The runner always chooses a number for backlog that is relative to
>>>>>> the SDKs reported backlog. It would be upto the SDK to round/clamp the
>>>>>> number given by the Runner to represent something meaningful for itself.
>>>>>> For example if the backlog that the SDK was reporting was bytes
>>>>>> remaining in a file such as 500, then the Runner could provide some value
>>>>>> like 212.2 which the SDK would then round to 212.
>>>>>> If the backlog that the SDK was reporting 57 pubsub messages, then
>>>>>> the Runner could provide a value like 300 which would mean to read 57
>>>>>> values and then another 243 as part of the current restriction.
>>>>>>
>>>>>> I believe that BoundedSource/UnboundedSource will have wrappers added
>>>>>> that provide a basic SplittableDoFn implementation so existing IOs should
>>>>>> be migrated over without API changes.
>>>>>>
>>>>>> On Mon, Sep 17, 2018 at 1:11 AM Ismaël Mejía <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks a lot Luke for bringing this back to the mailing list and
>>>>>>> Ryan for taking
>>>>>>> the notes.
>>>>>>>
>>>>>>> I would like to know if there was some discussion, or if you guys
>>>>>>> have given
>>>>>>> some thought to the required changes in the SDK (API) part. What
>>>>>>> will be the
>>>>>>> equivalent of `splitAtFraction` and what should IO authors do to
>>>>>>> support it..
>>>>>>>
>>>>>>> On Sat, Sep 15, 2018 at 1:52 AM Lukasz Cwik <[email protected]>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > Thanks to everyone who joined and for the questions asked.
>>>>>>> >
>>>>>>> > Ryan graciously collected notes of the discussion:
>>>>>>> https://docs.google.com/document/d/1kjJLGIiNAGvDiUCMEtQbw8tyOXESvwGeGZLL-0M06fQ/edit?usp=sharing
>>>>>>> >
>>>>>>> > The summary was that bringing BoundedSource/UnboundedSource into
>>>>>>> using a unified backlog-reporting mechanism with optional other signals
>>>>>>> that Dataflow has found useful (such as is the remaining restriction
>>>>>>> splittable (yes, no, unknown)). Other runners can use or not. SDFs 
>>>>>>> should
>>>>>>> report backlog and watermark as minimum bar. The backlog should use an
>>>>>>> arbitrary precision float such as Java BigDecimal to prevent issues 
>>>>>>> where
>>>>>>> limited precision removes the ability to compute delta efficiently.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > On Wed, Sep 12, 2018 at 3:54 PM Lukasz Cwik <[email protected]>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> Here is the link to join the discussion:
>>>>>>> https://meet.google.com/idc-japs-hwf
>>>>>>> >> Remember that it is this Friday Sept 14th from 11am-noon PST.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Mon, Sep 10, 2018 at 7:30 AM Maximilian Michels <
>>>>>>> [email protected]> wrote:
>>>>>>> >>>
>>>>>>> >>> Thanks for moving forward with this, Lukasz!
>>>>>>> >>>
>>>>>>> >>> Unfortunately, can't make it on Friday but I'll sync with
>>>>>>> somebody on
>>>>>>> >>> the call (e.g. Ryan) about your discussion.
>>>>>>> >>>
>>>>>>> >>> On 08.09.18 02:00, Lukasz Cwik wrote:
>>>>>>> >>> > Thanks for everyone who wanted to fill out the doodle poll.
>>>>>>> The most
>>>>>>> >>> > popular time was Friday Sept 14th from 11am-noon PST. I'll
>>>>>>> send out a
>>>>>>> >>> > calendar invite and meeting link early next week.
>>>>>>> >>> >
>>>>>>> >>> > I have received a lot of feedback on the document and have
>>>>>>> addressed
>>>>>>> >>> > some parts of it including:
>>>>>>> >>> > * clarifying terminology
>>>>>>> >>> > * processing skew due to some restrictions having their
>>>>>>> watermarks much
>>>>>>> >>> > further behind then others affecting scheduling of bundles by
>>>>>>> runners
>>>>>>> >>> > * external throttling & I/O wait overhead reporting to make
>>>>>>> sure we
>>>>>>> >>> > don't overscale
>>>>>>> >>> >
>>>>>>> >>> > Areas that still need additional feedback and details are:
>>>>>>> >>> > * reporting progress around the work that is done and is active
>>>>>>> >>> > * more examples
>>>>>>> >>> > * unbounded restrictions being caused by an unbounded number
>>>>>>> of splits
>>>>>>> >>> > of existing unbounded restrictions (infinite work growth)
>>>>>>> >>> > * whether we should be reporting this information at the
>>>>>>> PTransform
>>>>>>> >>> > level or at the bundle level
>>>>>>> >>> >
>>>>>>> >>> >
>>>>>>> >>> >
>>>>>>> >>> > On Wed, Sep 5, 2018 at 1:53 PM Lukasz Cwik <[email protected]
>>>>>>> >>> > <mailto:[email protected]>> wrote:
>>>>>>> >>> >
>>>>>>> >>> >     Thanks to all those who have provided interest in this
>>>>>>> topic by the
>>>>>>> >>> >     questions they have asked on the doc already and for those
>>>>>>> >>> >     interested in having this discussion. I have setup this
>>>>>>> doodle to
>>>>>>> >>> >     allow people to provide their availability:
>>>>>>> >>> >     https://doodle.com/poll/nrw7w84255xnfwqy
>>>>>>> >>> >
>>>>>>> >>> >     I'll send out the chosen time based upon peoples
>>>>>>> availability and a
>>>>>>> >>> >     Hangout link by end of day Friday so please mark your
>>>>>>> availability
>>>>>>> >>> >     using the link above.
>>>>>>> >>> >
>>>>>>> >>> >     The agenda of the meeting will be as follows:
>>>>>>> >>> >     * Overview of the proposal
>>>>>>> >>> >     * Enumerate and discuss/answer questions brought up in the
>>>>>>> meeting
>>>>>>> >>> >
>>>>>>> >>> >     Note that all questions and any discussions/answers
>>>>>>> provided will be
>>>>>>> >>> >     added to the doc for those who are unable to attend.
>>>>>>> >>> >
>>>>>>> >>> >     On Fri, Aug 31, 2018 at 9:47 AM Jean-Baptiste Onofré
>>>>>>> >>> >     <[email protected] <mailto:[email protected]>> wrote:
>>>>>>> >>> >
>>>>>>> >>> >         +1
>>>>>>> >>> >
>>>>>>> >>> >         Regards
>>>>>>> >>> >         JB
>>>>>>> >>> >         Le 31 août 2018, à 18:22, Lukasz Cwik <
>>>>>>> [email protected]
>>>>>>> >>> >         <mailto:[email protected]>> a écrit:
>>>>>>> >>> >
>>>>>>> >>> >             That is possible, I'll take people's date/time
>>>>>>> suggestions
>>>>>>> >>> >             and create a simple online poll with them.
>>>>>>> >>> >
>>>>>>> >>> >             On Fri, Aug 31, 2018 at 2:22 AM Robert Bradshaw
>>>>>>> >>> >             <[email protected] <mailto:[email protected]>>
>>>>>>> wrote:
>>>>>>> >>> >
>>>>>>> >>> >                 Thanks for taking this up. I added some
>>>>>>> comments to the
>>>>>>> >>> >                 doc. A European-friendly time for discussion
>>>>>>> would
>>>>>>> >>> >                 be great.
>>>>>>> >>> >
>>>>>>> >>> >                 On Fri, Aug 31, 2018 at 3:14 AM Lukasz Cwik
>>>>>>> >>> >                 <[email protected] <mailto:[email protected]>>
>>>>>>> wrote:
>>>>>>> >>> >
>>>>>>> >>> >                     I came up with a proposal[1] for a
>>>>>>> progress model
>>>>>>> >>> >                     solely based off of the backlog and that
>>>>>>> splits
>>>>>>> >>> >                     should be based upon the remaining backlog
>>>>>>> we want
>>>>>>> >>> >                     the SDK to split at. I also give
>>>>>>> recommendations to
>>>>>>> >>> >                     runner authors as to how an autoscaling
>>>>>>> system could
>>>>>>> >>> >                     work based upon the measured backlog. A
>>>>>>> lot of
>>>>>>> >>> >                     discussions around progress reporting and
>>>>>>> splitting
>>>>>>> >>> >                     in the past has always been around finding
>>>>>>> an
>>>>>>> >>> >                     optimal solution, after reading a lot of
>>>>>>> information
>>>>>>> >>> >                     about work stealing, I don't believe there
>>>>>>> is a
>>>>>>> >>> >                     general solution and it really is upto
>>>>>>> >>> >                     SplittableDoFns to be well behaved. I did
>>>>>>> not do
>>>>>>> >>> >                     much work in classifying what a well
>>>>>>> behaved
>>>>>>> >>> >                     SplittableDoFn is though. Much of this
>>>>>>> work builds
>>>>>>> >>> >                     off ideas that Eugene had documented in
>>>>>>> the past[2].
>>>>>>> >>> >
>>>>>>> >>> >                     I could use the communities wide knowledge
>>>>>>> of
>>>>>>> >>> >                     different I/Os to see if computing the
>>>>>>> backlog is
>>>>>>> >>> >                     practical in the way that I'm suggesting
>>>>>>> and to
>>>>>>> >>> >                     gather people's feedback.
>>>>>>> >>> >
>>>>>>> >>> >                     If there is a lot of interest, I would
>>>>>>> like to hold
>>>>>>> >>> >                     a community video conference between Sept
>>>>>>> 10th and
>>>>>>> >>> >                     14th about this topic. Please reply with
>>>>>>> your
>>>>>>> >>> >                     availability by Sept 6th if your
>>>>>>> interested.
>>>>>>> >>> >
>>>>>>> >>> >                     1:
>>>>>>> https://s.apache.org/beam-bundles-backlog-splitting
>>>>>>> >>> >                     2:
>>>>>>> https://s.apache.org/beam-breaking-fusion
>>>>>>> >>> >
>>>>>>> >>> >                     On Mon, Aug 13, 2018 at 10:21 AM
>>>>>>> Jean-Baptiste
>>>>>>> >>> >                     Onofré <[email protected] <mailto:
>>>>>>> [email protected]>> wrote:
>>>>>>> >>> >
>>>>>>> >>> >                         Awesome !
>>>>>>> >>> >
>>>>>>> >>> >                         Thanks Luke !
>>>>>>> >>> >
>>>>>>> >>> >                         I plan to work with you and others on
>>>>>>> this one.
>>>>>>> >>> >
>>>>>>> >>> >                         Regards
>>>>>>> >>> >                         JB
>>>>>>> >>> >                         Le 13 août 2018, à 19:14, Lukasz Cwik
>>>>>>> >>> >                         <[email protected] <mailto:
>>>>>>> [email protected]>> a
>>>>>>> >>> >                         écrit:
>>>>>>> >>> >
>>>>>>> >>> >                             I wanted to reach out that I will
>>>>>>> be
>>>>>>> >>> >                             continuing from where Eugene left
>>>>>>> off with
>>>>>>> >>> >                             SplittableDoFn. I know that many
>>>>>>> of you have
>>>>>>> >>> >                             done a bunch of work with IOs
>>>>>>> and/or runner
>>>>>>> >>> >                             integration for SplittableDoFn and
>>>>>>> would
>>>>>>> >>> >                             appreciate your help in advancing
>>>>>>> this
>>>>>>> >>> >                             awesome idea. If you have
>>>>>>> questions or
>>>>>>> >>> >                             things you want to get reviewed
>>>>>>> related to
>>>>>>> >>> >                             SplittableDoFn, feel free to send
>>>>>>> them my
>>>>>>> >>> >                             way or include me on anything
>>>>>>> SplittableDoFn
>>>>>>> >>> >                             related.
>>>>>>> >>> >
>>>>>>> >>> >                             I was part of several discussions
>>>>>>> with
>>>>>>> >>> >                             Eugene and I think the biggest
>>>>>>> outstanding
>>>>>>> >>> >                             design portion is to figure out
>>>>>>> how dynamic
>>>>>>> >>> >                             work rebalancing would play out
>>>>>>> with the
>>>>>>> >>> >                             portability APIs. This includes
>>>>>>> reporting of
>>>>>>> >>> >                             progress from within a bundle. I
>>>>>>> know that
>>>>>>> >>> >                             Eugene had shared some documents
>>>>>>> in this
>>>>>>> >>> >                             regard but the position / split
>>>>>>> models
>>>>>>> >>> >                             didn't work too cleanly in a
>>>>>>> unified sense
>>>>>>> >>> >                             for bounded and unbounded
>>>>>>> SplittableDoFns.
>>>>>>> >>> >                             It will likely take me awhile to
>>>>>>> gather my
>>>>>>> >>> >                             thoughts but could use your
>>>>>>> expertise as to
>>>>>>> >>> >                             how compatible these ideas are
>>>>>>> with respect
>>>>>>> >>> >                             to to IOs and runners
>>>>>>> >>> >
>>>>>>>  Flink/Spark/Dataflow/Samza/Apex/... and
>>>>>>> >>> >                             obviously help during
>>>>>>> implementation.
>>>>>>> >>> >
>>>>>>>
>>>>>>

Re: SplittableDoFn

Reply via email to