[ 
https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=273521&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-273521
 ]

ASF GitHub Bot logged work on BEAM-5605:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Jul/19 20:30
            Start Date: 08/Jul/19 20:30
    Worklog Time Spent: 10m 
      Work Description: lukecwik commented on pull request #8972: [BEAM-5605] 
Update Beam Java SDK backlog to track latest changes in Beam Python SDK.
URL: https://github.com/apache/beam/pull/8972#discussion_r301286051
 
 

 ##########
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java
 ##########
 @@ -723,6 +725,75 @@ public Duration getAllowedTimestampSkew() {
   @Experimental(Kind.SPLITTABLE_DO_FN)
   public @interface GetInitialRestriction {}
 
+  /**
+   * Annotation for the method that returns the corresponding size for an 
element and restriction
+   * pair.
+   *
+   * <p>Signature: {@code double getSize(InputT element, RestrictionT 
restriction);}
+   *
+   * <p>Returns a double representing the size of the element and restriction.
+   *
+   * <p>A representation for the amount of known work represented as a size. 
Size representations
+   * should preferably represent a linear space and be comparable within the 
same partition (see
+   * {@link GetPartition} for details on partition identifiers}).
+   *
+   * <p>Splittable {@link DoFn}s should only provide this method if the 
default implementation
+   * within the {@link RestrictionTracker} is an inaccurate representation of 
known work.
+   *
+   * <p>It is up to each splittable {@DoFn} to convert between their natural 
representation of
+   * outstanding work and this representation. For example:
+   *
+   * <ul>
+   *   <li>Block based file source (e.g. Avro): From the end of the current 
block, the remaining
+   *       number of bytes to the end of the restriction.
+   *   <li>Pull based queue based source (e.g. Pubsub): The local/global size 
available in number of
+   *       messages or number of {@code message bytes} that have not been 
processed.
+   *   <li>Key range based source (e.g. Shuffle, Bigtable, ...): Scale the 
start key to be one and
+   *       end key to be zero and interpolate the position of the next 
splittable key as the size.
+   *       If information about the probability density function or cumulative 
distribution function
+   *       is available, size interpolation can be improved. Alternatively, if 
the number of encoded
+   *       bytes for the keys and values is known for the key range, the 
number of remaining bytes
+   *       can be used.
+   * </ul>
+   */
+  @Documented
+  @Retention(RetentionPolicy.RUNTIME)
+  @Target(ElementType.METHOD)
+  @Experimental(Kind.SPLITTABLE_DO_FN)
+  public @interface GetSize {}
+
+  /**
+   * Annotation for the method that returns the corresponding partition 
identifier for an element
+   * and restriction pair.
+   *
+   * <p>Signature: {@code byte[] getPartitition(InputT element, RestrictionT 
restriction);}
+   *
+   * <p>Returns an immutable representation of the partition identifier as a 
byte[].
+   *
+   * <p>By default, the partition identifier is represented as the encoded 
element and restriction
+   * pair and should only be provided if the splittable {@link DoFn} can only 
provide a size over a
+   * shared resource such as a message queue that potentially multiple element 
and restriction pairs
+   * are doing work on. The partition identifier is used by runners for 
various size calculations.
+   * Sizes reported with the same partition identifier represent a point in 
time reporting of the
+   * size for that partition. For example, a runner can compute a global size 
by summing all
+   * reported sizes over all unique partition identifiers while it can compute 
the size of a
+   * specific partition based upon the last reported value.
+   *
+   * <p>For example splittable {@link DoFn}s which consume elements from:
+   *
+   * <ul>
+   *   <li>a globally shared resource such as a Pubsub queue should set this 
to "".
+   *   <li>a shared partitioned resource should use the partition identifier.
+   *   <li>a uniquely partitioned resource such as a file and offset range 
should not override this
+   *       since the default element and restriction pair should suffice.
+   * </ul>
+   */
+  @Documented
+  @Retention(RetentionPolicy.RUNTIME)
+  @Target(ElementType.METHOD)
+  @Experimental(Kind.SPLITTABLE_DO_FN)
+  public @interface GetPartition {}
 
 Review comment:
   In the javadoc it says,
   ```
   The partition identifier is used by runners for various size calculations.
   Sizes reported with the same partition identifier represent a point in time 
reporting of the
   size for that partition. For example, a runner can compute a global size by 
summing all
   reported sizes over all unique partition identifiers while it can compute 
the size of a
   specific partition based upon the last reported value.
   ```
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 273521)
    Time Spent: 1h  (was: 50m)

> Support Portable SplittableDoFn for batch
> -----------------------------------------
>
>                 Key: BEAM-5605
>                 URL: https://issues.apache.org/jira/browse/BEAM-5605
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Scott Wegner
>            Assignee: Luke Cwik
>            Priority: Major
>              Labels: portability
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Roll-up item tracking work towards supporting portable SplittableDoFn for 
> batch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to