lostluck commented on a change in pull request #11517:
URL: https://github.com/apache/beam/pull/11517#discussion_r419635469
##########
File path: sdks/go/pkg/beam/core/sdf/sdf.go
##########
@@ -13,63 +13,64 @@
// See the License for the specific language governing permissions and
// limitations under the License.
-// Package sdf is experimental, incomplete, and not yet meant for general
usage.
+// Package contains interfaces used specifically for splittable DoFns.
+//
+// Warning: Splittable DoFns are still experimental, largely untested, and
+// likely to have bugs.
package sdf
// RTracker is an interface used to interact with restrictions while
processing elements in
-// SplittableDoFns. Each implementation of RTracker is expected to be used for
tracking a single
-// restriction type, which is the type that should be used to create the
RTracker, and output by
-// TrySplit.
+// splittable DoFns (specifically, in the ProcessElement method). Each
RTracker tracks the progress
+// of a single restriction.
type RTracker interface {
- // TryClaim attempts to claim the block of work in the current
restriction located at a given
- // position. This method must be used in the ProcessElement method of
Splittable DoFns to claim
- // work before performing it. If no work is claimed, the ProcessElement
is not allowed to perform
- // work or emit outputs. If the claim is successful, the DoFn must
process the entire block. If
- // the claim is unsuccessful the ProcessElement method of the DoFn must
return without performing
- // any additional work or emitting any outputs.
- //
- // TryClaim accepts an arbitrary value that can be interpreted as the
position of a block, and
- // returns a boolean indicating whether the claim succeeded.
+ // TryClaim attempts to claim the block of work located in the given
position of the
+ // restriction. This method must be called in ProcessElement to claim
work before it can be
+ // processed. Processing work without claiming it first can lead to
incorrect output.
//
- // If the claim fails due to an error, that error can be retrieved with
GetError.
+ // If the claim is successful, the DoFn must process the entire block.
If the claim is
+ // unsuccessful ProcessElement method of the DoFn must return without
performing
+ // any additional work or emitting any outputs.
//
- // For SDFs to work properly, claims must always be monotonically
increasing in reference to the
- // restriction's start and end points, and every block of work in a
restriction must be claimed.
+ // If the claim fails due to an error, that error is stored and will be
automatically emitted
+ // when the RTracker is validated, or can be manually retrieved with
GetError.
//
// This pseudocode example illustrates the typical usage of TryClaim:
//
- // pos = position of first block after restriction.start
+ // pos = position of first block within the restriction
// for TryClaim(pos) == true {
// // Do all work in the claimed block and emit outputs.
- // pos = position of next block
+ // pos = position of next block within the restriction
// }
// return
TryClaim(pos interface{}) (ok bool)
- // GetError returns the error that made this RTracker stop executing,
and it returns nil if no
- // error occurred. If IsDone fails while validating this RTracker, this
method will be
- // called to log the error.
+ // GetError returns the error that made this RTracker stop executing,
and returns nil if no
+ // error occurred. This is the error that is emitted if automated
validation fails.
GetError() error
- // TrySplit splits the current restriction into a primary and residual
based on a fraction of the
- // work remaining. The split is performed along the first valid split
point located after the
- // given fraction of the remainder. This method is called by the SDK
harness when receiving a
- // split request by the runner.
+ // TrySplit splits the current restriction into a primary (currently
executing work) and
+ // residual (work to be split off) based on a fraction of work
remaining. The split is performed
+ // at the first valid split point located after the given fraction of
remaining work.
+ //
+ // For example, a fraction of 0.5 means to split at the halfway point
of remaining work only. If
+ // 50% of work is done and 50% remaining, then a fraction of 0.5 would
split after 75% of work.
+ //
+ // This method modifies the underlying restriction in the RTracker to
reflect the primary. It
+ // then returns a copy of the newly modified restriction as a primary,
and returns a new
+ // restriction for the residual. If the split would produce an empty
residual (i.e. the only
+ // split point is the end of the restriction), then the returned
residual is nil.
//
- // The current restriction is split into two by modifying the current
restriction's endpoint to
- // turn it into the primary, and returning a new restriction tracker
representing the residual.
- // If no valid split point exists, this method returns nil instead of a
residual, but does not
- // return an error. If this method is unable to split due to some error
then it returns nil and
- // an error.
- TrySplit(fraction float64) (residual interface{}, err error)
+ // If an error is returned, some catastrophic failure occurred and the
entire bundle will fail.
+ TrySplit(fraction float64) (primary, residual interface{}, err error)
Review comment:
Technically since GetError is going to be reading from a location that
might be concurrently modified, it's required to be thread safe as well.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]