mosche commented on a change in pull request #15848:
URL: https://github.com/apache/beam/pull/15848#discussion_r799521812
##########
File path:
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
##########
@@ -196,31 +197,60 @@
* );
* }</code></pre>
*
- * <p>3. To read all data from a table in parallel with partitioning can be
done with {@link
- * ReadWithPartitions}:
+ * <h4>Parallel reading from a JDBC datasource</h4>
+ *
+ * <p>Beam supports partitioned reading of all data from a table. Automatic
partitioning is
+ * supported for a few data types: {@link Long}, {@link
org.joda.time.DateTime}, {@link String}. To
+ * enable this, use {@link ReadWithPartitions}.
+ *
+ * <p>The partitioning scheme depends on these parameters, which can be
user-provided, or
+ * automatically inferred by Beam (for the supported types):
+ *
+ * <ul>
+ * <li>Upper bound
+ * <li>Lower bound
+ * <li>Number of partitions - when auto-inferred, the number of partitions
defaults to the square
+ * root of the number of rows divided by 5 (i.e.: {@code
Math.floor(Math.sqrt(numRows) / 5)}).
+ * </ul>
+ *
+ * <p>To trigger auto-inference of these parameters, the user just needs to
not provide them. To
+ * infer them automatically, Beam runs either of these statements:
+ *
+ * <ul>
+ * <li>{@code SELECT min(column), max(column), COUNT(*) from table} when
none of the parameters is
+ * passed to the transform.
+ * <li>{@code SELECT min(column), max(column) from table} when only number
of partitions is
+ * provided, but not upper or lower bounds.
+ * </ul>
+ *
+ * <p><b>Should I use this transform?</b> Consider using this transform in the
following situations:
+ *
+ * <ul>
+ * <li>The partitioning column is indexed. This will help speed up the range
queries
Review comment:
thx 👍
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]