abhiy13 commented on a change in pull request #12645:
URL: https://github.com/apache/beam/pull/12645#discussion_r475174519
##########
File path:
sdks/java/io/contextual-text-io/src/main/java/org/apache/beam/sdk/io/contextualtextio/ContextualTextIO.java
##########
@@ -151,8 +157,27 @@
* .apply(ContextualTextIO.readFiles());
* }</pre>
*
- * NOTE: Using {@link
ContextualTextIO.Read#withHasMultilineCSVRecords(Boolean)} introduces a
- * performance penalty: when this option is enabled, the input cannot be split
and read in parallel.
+ * <p>Example 6: reading without recordNum metadata, or only fileName
associated Metadata. (the
+ * Objects would still contain recordNums, but these recordNums would
correspond to their positions
+ * in their respective offsets rather than their positions within the entire
file).
+ *
+ * <pre>{@code
+ * Pipeline p = ...;
+ *
+ * PCollection<RecordWithMetadata> records = p.apply(ContextualTextIO.read()
+ * .from("/local/path/to/files/*.csv")
+ * .setWithoutRecordNumMetadata(true));
+ * }</pre>
+ *
+ * <p>NOTE: When using {@link
ContextualTextIO.Read#withHasMultilineCSVRecords(Boolean)} this
+ * option, a single reader will be used to process the file, rather than
multiple readers which can
+ * read from different offsets. For a large file this can result in lower
performance.
+ *
+ * <p>NOTE: Use {@link Read#withoutRecordNumMetadata()} when recordNum
metadata is not required or
+ * when only metadata associated with filenames is required, Not using this
option introduces a
+ * shuffle step which increases the resources used by the pipeline. <b>This
option is set to false
+ * by default. Meaning that the shuffle step will be performed, set it to
false to avoid the shuffle
Review comment:
Done.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]