Daniel Mescheder created BEAM-6719:
--------------------------------------
Summary: Allow multiple Joins in the same pipeline
Key: BEAM-6719
URL: https://issues.apache.org/jira/browse/BEAM-6719
Project: Beam
Issue Type: Improvement
Components: sdk-java-join-library
Reporter: Daniel Mescheder
Currently it is not possible to have multiple joins in the same pipeline
without wrapping them in individual PTransforms as this would generate name
clashes.
Consider the following test case:
{code:java}
@Test
public void testMultipleJoinsInSamePipeline() {
leftListOfKv.add(KV.of("Key2", 4L));
PCollection<KV<String, Long>> leftCollection = p.apply("CreateLeft",
Create.of(leftListOfKv));
rightListOfKv.add(KV.of("Key2", "bar"));
PCollection<KV<String, String>> rightCollection = p.apply("CreateRight",
Create.of(rightListOfKv));
expectedResult.add(KV.of("Key2", KV.of(4L, "bar")));
PCollection<KV<String, KV<Long, String>>> output1 =
Join.innerJoin(leftCollection, rightCollection);
PCollection<KV<String, KV<Long, String>>> output2 =
Join.innerJoin(leftCollection, rightCollection);
PAssert.that(output1).containsInAnyOrder(expectedResult);
PAssert.that(output2).containsInAnyOrder(expectedResult);
p.run();
}
{code}
This fails because of clashing names in the pipeline and there is currently no
way to use the join library to give the joins different names.
Therefore I find myself routinely wrapping joins in new PTransforms which leads
me to believe that this should be part of the library itself.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)