[GitHub] [beam] rosetn commented on a change in pull request #13227: [BEAM-10480] Add splittable dofn as the recommended way of building connectors.

GitBox Thu, 19 Nov 2020 13:46:27 -0800


rosetn commented on a change in pull request #13227:
URL: https://github.com/apache/beam/pull/13227#discussion_r527219546




##########
File path: 
website/www/site/content/en/documentation/io/developing-io-overview.md
##########
@@ -46,33 +46,32 @@ are the recommended steps to get started:
 For **bounded (batch) sources**, there are currently two options for creating a
 Beam source:
 
+1. Use `Splittable DoFn`.
+
 1. Use `ParDo` and `GroupByKey`.
 
-1. Use the `Source` interface and extend the `BoundedSource` abstract subclass.
 
-`ParDo` is the recommended option, as implementing a `Source` can be tricky. 
See
-[When to use the Source interface](#when-to-use-source) for a list of some use
-cases where you might want to use a `Source` (such as
-[dynamic work rebalancing](/blog/2016/05/18/splitAtFraction-method.html)).
+`Splittable DoFn` is the recommended option, as it's the new source framework 
for both bounded and

Review comment:
       Is there a way to avoid the word "new" that works? 
   
   "Most recent"? "Provides the most support"?

##########
File path: website/www/site/content/en/documentation/io/developing-io-java.md
##########
@@ -17,6 +17,9 @@ limitations under the License.
 -->
 # Developing I/O connectors for Java
 
+**IMPORTANT:** Please use ``Splittable DoFn`` to develop your new I/O. For 
more details, please read

Review comment:
       Remove instances of "please" on this page
   
   Reference: https://developers.google.com/style/tone#politeness

##########
File path: 
website/www/site/content/en/documentation/io/developing-io-overview.md
##########
@@ -90,22 +89,40 @@ performance:
   jobs. Depending on your data source, dynamic work rebalancing might not be
   possible.
 
-* **Splitting into parts of particular size recommended by the runner:** 
`ParDo`
-  does not receive `desired_bundle_size` as a hint from runners when performing
-  initial splitting.
+* **Splitting initially to increase parallelism:** `ParDo`
+  does not have the ability to perform initial splitting.
 
 For example, if you'd like to read from a new file format that contains many
 records per file, or if you'd like to read from a key-value store that supports
 read operations in sorted key order.
 
-### Source lifecycle {#source}
-Here is a sequence diagram that shows the lifecycle of the Source during
- the execution of the Read transform of an IO. The comments give useful
- information to IO developers such as the constraints that
- apply to the objects or particular cases such as streaming mode.
-
- <!-- The source for the sequence diagram can be found in the the SVG 
resource. -->
-![This is a sequence diagram that shows the lifecycle of the 
Source](/images/source-sequence-diagram.svg)
+### Real World IO Examples Using Splittable DoFn

Review comment:
       I'd replace this with: 
   
   I/O examples using SDFs

##########
File path: 
website/www/site/content/en/documentation/io/developing-io-overview.md
##########
@@ -46,33 +46,32 @@ are the recommended steps to get started:
 For **bounded (batch) sources**, there are currently two options for creating a
 Beam source:
 
+1. Use `Splittable DoFn`.
+
 1. Use `ParDo` and `GroupByKey`.
 
-1. Use the `Source` interface and extend the `BoundedSource` abstract subclass.
 
-`ParDo` is the recommended option, as implementing a `Source` can be tricky. 
See
-[When to use the Source interface](#when-to-use-source) for a list of some use
-cases where you might want to use a `Source` (such as
-[dynamic work rebalancing](/blog/2016/05/18/splitAtFraction-method.html)).
+`Splittable DoFn` is the recommended option, as it's the new source framework 
for both bounded and
+unbounded sources. This is meant to replace the `Source` APIs(
+[BoundedSource](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/BoundedSource.html)
 and
+[UnboundedSource](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/UnboundedSource.html))
+in the new system. Please read
+[Splittable DoFn Programming 
Guide](/learn/programming-guide/#splittable-dofns) for how to write one
+Splittable DoFn. For more information, see the
+[roadmap for multi-SDK connector efforts](/roadmap/connectors-multi-sdk/).
 
-(Java only) For **unbounded (streaming) sources**, you must use the `Source`
-interface and extend the `UnboundedSource` abstract subclass. `UnboundedSource`
-supports features that are useful for streaming pipelines, such as
-checkpointing.
+For java and python **unbounded (streaming) sources**, you must use the 
`Splittable DoFn`, which

Review comment:
       Capitalize Java and Python




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] rosetn commented on a change in pull request #13227: [BEAM-10480] Add splittable dofn as the recommended way of building connectors.

Reply via email to