ahmedabu98 commented on code in PR #33504:
URL: https://github.com/apache/beam/pull/33504#discussion_r1987700344


##########
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IcebergIO.java:
##########
@@ -310,7 +409,155 @@
  *   </tr>
  * </table>
  *
- * <p>For internal use only; no backwards compatibility guarantees
+ * <br>
+ * <br>
+ *
+ * <h2>Reading from Tables</h2>
+ *
+ * With the following configuration,
+ *
+ * <pre>{@code
+ * Map<String, Object> config = Map.of(
+ *         "table", table,
+ *         "catalog_name", name,
+ *         "catalog_properties", Map.of(...),
+ *         "config_properties", Map.of(...));
+ * }</pre>
+ *
+ * Example of a simple batch read:
+ *
+ * <pre>{@code
+ * PCollection<Row> rows = pipeline
+ *     .apply(Managed.read(ICEBERG).withConfig(config))
+ *     .getSinglePCollection();
+ * }</pre>
+ *
+ * Example of a simple CDC read:
+ *
+ * <pre>{@code
+ * PCollection<Row> output = pipeline
+ *     .apply(Managed.read(ICEBERG_CDC).withConfig(config))
+ *     .getSinglePCollection();
+ *
+ * PCollection<Row> rows = output
+ *     .apply(ReadUtils.extractRecords());
+ * }</pre>
+ *
+ * <p><b>Note</b>: This reads <b>append-only</b> snapshots. Full CDC is not 
supported yet.
+ *
+ * <p>The CDC <b>streaming</b> source (enabled with {@code streaming=true}) 
continuously polls the
+ * table for new snapshots, with a default interval of 60 seconds. This can be 
overridden using
+ * <b>{@code poll_interval_seconds}</b>:
+ *
+ * <pre>{@code
+ * config.put("streaming", true);
+ * config.put("poll_interval_seconds", 10);
+ * }</pre>
+ *
+ * <h3>Output Schema</h3>
+ *
+ * <p>Reading with <b>{@code Managed.read(ICEBERG)}</b> produces a <b>{@code 
PCollection<Row>}</b>
+ * containing data records that conform to the table schema.
+ *
+ * <p>Reading with <b>{@code Managed.read(ICEBERG_CDC)}</b> produces a 
<b>{@code
+ * PCollection<Row>}</b> with the following schema:
+ *
+ * <table border="1" cellspacing="1">
+ *   <tr>
+ *     <td> <b>Field</b> </td> <td> <b>Type</b> </td> <td> <b>Description</b> 
</td>
+ *   </tr>
+ *   <tr>
+ *     <td> {@code record} </td>
+ *     <td> {@code Beam Row} </td>
+ *     <td>
+ *       The data record.
+ *     </td>
+ *   </tr>
+ *   <tr>
+ *     <td> {@code operation} </td>
+ *     <td> {@code string} </td>
+ *     <td>
+ *       The snapshot <a 
href="https://iceberg.apache.org/javadoc/0.11.0/org/apache/iceberg/DataOperations";>operation</a>
 associated with this record. For now, only "append" is supported.

Review Comment:
   The value is actually lowercase (see 
[ref](https://github.com/apache/iceberg/blob/6e8718113c08aebf76d8e79a9e2534c89c73407a/api/src/main/java/org/apache/iceberg/DataOperations.java#L31-L37))
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to