rosetn commented on a change in pull request #13227:
URL: https://github.com/apache/beam/pull/13227#discussion_r527219546
##########
File path:
website/www/site/content/en/documentation/io/developing-io-overview.md
##########
@@ -46,33 +46,32 @@ are the recommended steps to get started:
For **bounded (batch) sources**, there are currently two options for creating a
Beam source:
+1. Use `Splittable DoFn`.
+
1. Use `ParDo` and `GroupByKey`.
-1. Use the `Source` interface and extend the `BoundedSource` abstract subclass.
-`ParDo` is the recommended option, as implementing a `Source` can be tricky.
See
-[When to use the Source interface](#when-to-use-source) for a list of some use
-cases where you might want to use a `Source` (such as
-[dynamic work rebalancing](/blog/2016/05/18/splitAtFraction-method.html)).
+`Splittable DoFn` is the recommended option, as it's the new source framework
for both bounded and
Review comment:
Is there a way to avoid the word "new" that works?
"Most recent"? "Provides the most support"?
##########
File path: website/www/site/content/en/documentation/io/developing-io-java.md
##########
@@ -17,6 +17,9 @@ limitations under the License.
-->
# Developing I/O connectors for Java
+**IMPORTANT:** Please use ``Splittable DoFn`` to develop your new I/O. For
more details, please read
Review comment:
Remove instances of "please" on this page
Reference: https://developers.google.com/style/tone#politeness
##########
File path:
website/www/site/content/en/documentation/io/developing-io-overview.md
##########
@@ -90,22 +89,40 @@ performance:
jobs. Depending on your data source, dynamic work rebalancing might not be
possible.
-* **Splitting into parts of particular size recommended by the runner:**
`ParDo`
- does not receive `desired_bundle_size` as a hint from runners when performing
- initial splitting.
+* **Splitting initially to increase parallelism:** `ParDo`
+ does not have the ability to perform initial splitting.
For example, if you'd like to read from a new file format that contains many
records per file, or if you'd like to read from a key-value store that supports
read operations in sorted key order.
-### Source lifecycle {#source}
-Here is a sequence diagram that shows the lifecycle of the Source during
- the execution of the Read transform of an IO. The comments give useful
- information to IO developers such as the constraints that
- apply to the objects or particular cases such as streaming mode.
-
- <!-- The source for the sequence diagram can be found in the the SVG
resource. -->
-
+### Real World IO Examples Using Splittable DoFn
Review comment:
I'd replace this with:
I/O examples using SDFs
##########
File path:
website/www/site/content/en/documentation/io/developing-io-overview.md
##########
@@ -46,33 +46,32 @@ are the recommended steps to get started:
For **bounded (batch) sources**, there are currently two options for creating a
Beam source:
+1. Use `Splittable DoFn`.
+
1. Use `ParDo` and `GroupByKey`.
-1. Use the `Source` interface and extend the `BoundedSource` abstract subclass.
-`ParDo` is the recommended option, as implementing a `Source` can be tricky.
See
-[When to use the Source interface](#when-to-use-source) for a list of some use
-cases where you might want to use a `Source` (such as
-[dynamic work rebalancing](/blog/2016/05/18/splitAtFraction-method.html)).
+`Splittable DoFn` is the recommended option, as it's the new source framework
for both bounded and
+unbounded sources. This is meant to replace the `Source` APIs(
+[BoundedSource](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/BoundedSource.html)
and
+[UnboundedSource](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/UnboundedSource.html))
+in the new system. Please read
+[Splittable DoFn Programming
Guide](/learn/programming-guide/#splittable-dofns) for how to write one
+Splittable DoFn. For more information, see the
+[roadmap for multi-SDK connector efforts](/roadmap/connectors-multi-sdk/).
-(Java only) For **unbounded (streaming) sources**, you must use the `Source`
-interface and extend the `UnboundedSource` abstract subclass. `UnboundedSource`
-supports features that are useful for streaming pipelines, such as
-checkpointing.
+For java and python **unbounded (streaming) sources**, you must use the
`Splittable DoFn`, which
Review comment:
Capitalize Java and Python
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]