Daniel Mescheder created BEAM-6719:
--------------------------------------

             Summary: Allow multiple Joins in the same pipeline
                 Key: BEAM-6719
                 URL: https://issues.apache.org/jira/browse/BEAM-6719
             Project: Beam
          Issue Type: Improvement
          Components: sdk-java-join-library
            Reporter: Daniel Mescheder


Currently it is not possible to have multiple joins in the same pipeline 
without wrapping them in individual PTransforms as this would generate name 
clashes.

Consider the following test case:
{code:java}
@Test
public void testMultipleJoinsInSamePipeline() { 
  leftListOfKv.add(KV.of("Key2", 4L)); 
  PCollection<KV<String, Long>> leftCollection = p.apply("CreateLeft", 
Create.of(leftListOfKv));
  rightListOfKv.add(KV.of("Key2", "bar")); 
  PCollection<KV<String, String>> rightCollection = p.apply("CreateRight", 
Create.of(rightListOfKv));
  expectedResult.add(KV.of("Key2", KV.of(4L, "bar")));
  PCollection<KV<String, KV<Long, String>>> output1 = 
Join.innerJoin(leftCollection, rightCollection);
  PCollection<KV<String, KV<Long, String>>> output2 = 
Join.innerJoin(leftCollection, rightCollection);
 PAssert.that(output1).containsInAnyOrder(expectedResult);
 PAssert.that(output2).containsInAnyOrder(expectedResult);
 p.run(); 
}
{code}
This fails because of clashing names in the pipeline and there is currently no 
way to use the join library to give the joins different names.

Therefore I find myself routinely wrapping joins in new PTransforms which leads 
me to believe that this should be part of the library itself.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to