GitHub user reuvenlax opened a pull request:

    https://github.com/apache/beam/pull/2415

    [BEAM-437] Support data-dependent writes using BigQuery batch load jobs

    This pull request adds support for data-dependent writes when using batch 
load jobs. This is accomplished via refactoring BigQueryIO into separate 
transforms, with the first being a common PrepareWrite transform that 
determines which tables records should go to, followed by transforms that know 
how to interpret this.
    
    One side benefit of this refactoring is that the different components can 
be used on their own. For example, one request has been to allow dynamic 
creation of datasets in BigQueryIO. A user can now accomplish this by running 
PrepareWrite themselves, followed by their own custom transform to create 
datasets, and then the remaining transform.
    
    In order to test this, BigQueryIOTest was modified to use a proper fake 
service, removing the dependency on mockito.
    
    R: @jkff 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/reuvenlax/incubator-beam 
dynamic_writes_in_batch

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/2415.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2415
    
----
commit 6088bf19dc03bb5ca0ccb760c52793ae27dfc06b
Author: Reuven Lax <[email protected]>
Date:   2017-03-28T18:21:59Z

    Use tableRefFunction throughout BigQueryIO. Constant table writes use 
ConstantTableSpecFunction.

commit 73fa547e4ca2b44c4f11d7c7ed4d7ac77a701ad5
Author: Reuven Lax <[email protected]>
Date:   2017-03-28T19:53:27Z

    Add PrepareWrite transform.

commit 60040c4991ee2fe5572d3dd7e2dfd381e21cead8
Author: Reuven Lax <[email protected]>
Date:   2017-03-29T02:34:56Z

    Refactor streaming write branch into separate reusable components.

commit 359685ab997c934837c601610fec471b3da1dcbd
Author: Reuven Lax <[email protected]>
Date:   2017-03-29T14:34:10Z

    Refactor batch load job path, and add support for data-dependent tables.

commit c9a1f2916af5cd2837d4d73887005e3b2ceff401
Author: Reuven Lax <[email protected]>
Date:   2017-03-31T18:19:25Z

    Refactor batch loads, and add support for windowed writes.

commit 477b14f4952881d965f22b7591da1032dcfd0495
Author: Reuven Lax <[email protected]>
Date:   2017-03-31T21:16:48Z

    Update tests

commit a6fb0292879b7ff9a68de2884417a4efd21f6479
Author: Reuven Lax <[email protected]>
Date:   2017-04-01T01:53:04Z

    testing changes

commit 5a2a2dc55bb7339a5c17280ed6ad66cb13eef54d
Author: Reuven Lax <[email protected]>
Date:   2017-04-02T18:32:37Z

    Fix more tests

commit cc146874470b51b0295a02cdcb81effda03372af
Author: Reuven Lax <[email protected]>
Date:   2017-04-02T18:37:06Z

    Fix CheckStyle issues

commit 89f2dc88431e71f8d11cd9942c2ef653bfc1a2c1
Author: Reuven Lax <[email protected]>
Date:   2017-04-03T02:47:03Z

    Final tests all work now

commit 6662121da44f16d79718c68dccf6eb6a86329268
Author: Reuven Lax <[email protected]>
Date:   2017-04-03T02:57:50Z

    Some cleanups and comments

commit 257ccc06f10cd048b8190e124b241f3bd98c647b
Author: Reuven Lax <[email protected]>
Date:   2017-04-03T03:27:16Z

    Remove ReturnT

commit 1ad3720c0273a808cafc2dd4d6e096b4f492c42b
Author: Reuven Lax <[email protected]>
Date:   2017-04-03T04:39:50Z

    Separate streaming writes into two pluggable components - CreateTables, and 
StreamingWriteTables.

commit a111b148a2bf8bbb5f1119c0bff922c0801d0582
Author: Reuven Lax <[email protected]>
Date:   2017-04-03T04:43:16Z

    Checkstyle fixes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to