iemejia commented on a change in pull request #13585:
URL: https://github.com/apache/beam/pull/13585#discussion_r546732773
##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
##########
@@ -1741,9 +1740,8 @@ public void populateDisplayData(DisplayData.Builder
builder) {
}
@Override
- public PDone expand(PCollection<T> input) {
- input.apply(inner);
- return PDone.in(input.getPipeline());
+ public WriteFilesResult<?> expand(PCollection<T> input) {
Review comment:
This solution looks ok, but I am wary of the consequences of changing
the return type. It looks like we introduced TypedWrite on Beam for the achieve
the same goal without breaking backwards compatibility.
The 'modern' preferred way to write files on Beam is via `FileIO.write()`
which already returns `WriteFilesResult`.
Can you check if you we can achieve the intended results by relying on
FileIO.write() + AvroIO.sink() ? or is there anything missing?
CC: @jkff in case you have some extra detail to add.
##########
File path:
sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSchemaIOProvider.java
##########
@@ -109,12 +107,12 @@ public Schema schema() {
}
@Override
- public PTransform<PCollection<Row>, POutput> buildWriter() {
+ public PTransform<PCollection<Row>, WriteFilesResult<?>> buildWriter() {
Review comment:
This looks good, here the change has less issues since this class is
`@Internal`
CC @TheNeuralBit for awarenes.
##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
##########
@@ -90,7 +89,8 @@
/**
* A {@link PTransform} that writes to a {@link FileBasedSink}. A write begins
with a sequential
* global initialization of a sink, followed by a parallel write, and ends
with a sequential
- * finalization of the write. The output of a write is {@link PDone}.
+ * finalization of the write. The output of a write is {@link
WriteFilesResult} with the files
Review comment:
:+1:
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]