[ 
https://issues.apache.org/jira/browse/BEAM-4461?focusedWorklogId=178774&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-178774
 ]

ASF GitHub Bot logged work on BEAM-4461:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 26/Dec/18 20:53
            Start Date: 26/Dec/18 20:53
    Worklog Time Spent: 10m 
      Work Description: reuvenlax commented on pull request #7353: [BEAM-4461] 
Support inner and outer style joins in CoGroup.
URL: https://github.com/apache/beam/pull/7353
 
 
   Multiple improvements to the schema CoGroup transform:
     * Allow the user to use strings instead of TupleTags. TupleTags existed to 
make Java type inference work, and this is not needed with the schema-based 
join as the types are in the schema. This also allows a simpler builder for 
PCollectionTuple.
   
     * Instead of multiple CoGroup.byFieldNames, byFieldIds, etc. the new 
syntax is CoGroup.join(By.fieldNames), CoGroup.join(By.fieldIds), etc. This 
shrinks the API surface area, and also provides a place to provide per-input 
options (used for outer joins).
   
   * Add a .crossProductJoin. This expands the iterables into an inner-product. 
For example:
       PCollection<Row> innerJoined = inputs.apply(
           CoGroup.join("input1", By.fieldNames("user"))
                          .join("input2", By.fieldNames("user"))
                          .crossProductJoin();
   
   * Each input can be marked for "outer-join" participation semantics. This 
means that if no records for that input are present for a join key, an output 
is still generated from the cross product with the value for that input 
replaced by a null. This generalizes normal left/right/full outer joins to N 
inputs. For example with two inputs:
       PCollection<Row> leftOuterJoined = inputs.apply(
           CoGroup.join("input1", 
By.fieldNames("user").withOuterJoinParticipation())
                          .join("input2", By.fieldNames("user"))
                          .crossProductJoin();
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 178774)
    Time Spent: 19h  (was: 18h 50m)

> Create a library of useful transforms that use schemas
> ------------------------------------------------------
>
>                 Key: BEAM-4461
>                 URL: https://issues.apache.org/jira/browse/BEAM-4461
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-java-core
>            Reporter: Reuven Lax
>            Assignee: Reuven Lax
>            Priority: Major
>          Time Spent: 19h
>  Remaining Estimate: 0h
>
> e.g. JoinBy(fields). Project, Filter, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to