chamikaramj commented on code in PR #33504:
URL: https://github.com/apache/beam/pull/33504#discussion_r1984039645


##########
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IcebergIO.java:
##########
@@ -310,7 +409,155 @@
  *   </tr>
  * </table>
  *
- * <p>For internal use only; no backwards compatibility guarantees
+ * <br>
+ * <br>
+ *
+ * <h2>Reading from Tables</h2>
+ *
+ * With the following configuration,
+ *
+ * <pre>{@code
+ * Map<String, Object> config = Map.of(
+ *         "table", table,
+ *         "catalog_name", name,
+ *         "catalog_properties", Map.of(...),
+ *         "config_properties", Map.of(...));
+ * }</pre>
+ *
+ * Example of a simple batch read:
+ *
+ * <pre>{@code
+ * PCollection<Row> rows = pipeline
+ *     .apply(Managed.read(ICEBERG).withConfig(config))
+ *     .getSinglePCollection();
+ * }</pre>
+ *
+ * Example of a simple CDC read:
+ *
+ * <pre>{@code
+ * PCollection<Row> output = pipeline
+ *     .apply(Managed.read(ICEBERG_CDC).withConfig(config))
+ *     .getSinglePCollection();
+ *
+ * PCollection<Row> rows = output
+ *     .apply(ReadUtils.extractRecords());
+ * }</pre>
+ *
+ * <p><b>Note</b>: This reads <b>append-only</b> snapshots. Full CDC is not 
supported yet.
+ *
+ * <p>The CDC <b>streaming</b> source (enabled with {@code streaming=true}) 
continuously polls the

Review Comment:
   We should validate (and fail) somewhere if the "streaming" flag is set here 
and the streaming PipelineOption [1] is not set.
   
   [1] 
https://github.com/apache/beam/blob/c1d0fa4f850f70a7dd5817227127fd9beadcc73e/sdks/java/core/src/main/java/org/apache/beam/sdk/options/StreamingOptions.java#L38



##########
sdks/java/managed/src/main/java/org/apache/beam/sdk/managed/Managed.java:
##########
@@ -108,6 +110,7 @@ public class Managed {
    *
    * <ul>
    *   <li>{@link Managed#ICEBERG} : Read from Apache Iceberg tables
+   *   <li>{@link Managed#ICEBERG_CDC} : CDC Read from Apache Iceberg tables

Review Comment:
   We should link to locations where users can find additional Javadocs related 
to each of these options (also for write).



##########
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IcebergIO.java:
##########
@@ -310,7 +409,155 @@
  *   </tr>
  * </table>
  *
- * <p>For internal use only; no backwards compatibility guarantees
+ * <br>
+ * <br>
+ *
+ * <h2>Reading from Tables</h2>
+ *
+ * With the following configuration,
+ *
+ * <pre>{@code
+ * Map<String, Object> config = Map.of(
+ *         "table", table,
+ *         "catalog_name", name,
+ *         "catalog_properties", Map.of(...),
+ *         "config_properties", Map.of(...));
+ * }</pre>
+ *
+ * Example of a simple batch read:
+ *
+ * <pre>{@code
+ * PCollection<Row> rows = pipeline
+ *     .apply(Managed.read(ICEBERG).withConfig(config))
+ *     .getSinglePCollection();
+ * }</pre>
+ *
+ * Example of a simple CDC read:
+ *
+ * <pre>{@code
+ * PCollection<Row> output = pipeline
+ *     .apply(Managed.read(ICEBERG_CDC).withConfig(config))
+ *     .getSinglePCollection();
+ *
+ * PCollection<Row> rows = output
+ *     .apply(ReadUtils.extractRecords());
+ * }</pre>
+ *
+ * <p><b>Note</b>: This reads <b>append-only</b> snapshots. Full CDC is not 
supported yet.
+ *
+ * <p>The CDC <b>streaming</b> source (enabled with {@code streaming=true}) 
continuously polls the
+ * table for new snapshots, with a default interval of 60 seconds. This can be 
overridden using
+ * <b>{@code poll_interval_seconds}</b>:
+ *
+ * <pre>{@code
+ * config.put("streaming", true);
+ * config.put("poll_interval_seconds", 10);
+ * }</pre>
+ *
+ * <h3>Output Schema</h3>
+ *
+ * <p>Reading with <b>{@code Managed.read(ICEBERG)}</b> produces a <b>{@code 
PCollection<Row>}</b>
+ * containing data records that conform to the table schema.
+ *
+ * <p>Reading with <b>{@code Managed.read(ICEBERG_CDC)}</b> produces a 
<b>{@code
+ * PCollection<Row>}</b> with the following schema:
+ *
+ * <table border="1" cellspacing="1">
+ *   <tr>
+ *     <td> <b>Field</b> </td> <td> <b>Type</b> </td> <td> <b>Description</b> 
</td>
+ *   </tr>
+ *   <tr>
+ *     <td> {@code record} </td>
+ *     <td> {@code Beam Row} </td>
+ *     <td>
+ *       The data record.
+ *     </td>
+ *   </tr>
+ *   <tr>
+ *     <td> {@code operation} </td>
+ *     <td> {@code string} </td>
+ *     <td>
+ *       The snapshot <a 
href="https://iceberg.apache.org/javadoc/0.11.0/org/apache/iceberg/DataOperations";>operation</a>
 associated with this record. For now, only "append" is supported.

Review Comment:
   May be change to "APPEND" to be consistent with Iceberg.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to