I wanted to know if anybody has any comment on external transform API for
Java SDK.

`External.of()` can create external transform for Java SDK. Depending on
input and output types, two additional methods are provided:
`withMultiOutputs()` which specifies the type of PCollection and
`withOutputType()` which specifies the type of output element. Some
examples are:

PCollection<String> col =
    testPipeline
        .apply(Create.of("1", "2", "3"))
        .apply(External.of(*...*));

This is okay without additional methods since 1) input and output types of
external transform can be inferred 2) output PCollection is singular.

PCollectionTuple pTuple =
    testPipeline
        .apply(Create.of(1, 2, 3, 4, 5, 6))
        .apply(
            External.of(*...*).withMultiOutputs());

This requires `withMultiOutputs()` since output PCollection is
PCollectionTuple.

PCollection<String> pCol =
    testPipeline
        .apply(Create.of("1", "2", "2", "3", "3", "3"))
        .apply(
            External.of(...)
                .<KV<String, Long>>withOutputType())
        .apply(
            "toString",
            MapElements.into(TypeDescriptors.strings()).via(
     x -> String.format("%s->%s", x.getKey(), x.getValue())));

 This requires `withOutputType()` since the output element type cannot be
inferred from method chaining. I think some users may feel awkward to call
method only with the type parameter and empty parenthesis. Without
`withOutputType()`, the type of output element will be java.lang.Object
which might still be forcefully casted to KV.

Thanks,
Heejong

Reply via email to