GitHub user reuvenlax opened a pull request:
https://github.com/apache/beam/pull/3356
[BEAM-92] Allow value-dependent files in FileBasedSink
This is modeled off of the pattern used in BigQueryIO. The user can provide
a DynamicDestinations class which can map the input into a user-defined
destination type, and the destination into a FilenamePolicy. Some refactoring
of FilenamePolicy is done as well to make this more useful - e.g. allowing
FilenamePolicy to pick its own base directory instead of having it passed in
from the sink (this is marked as @Experimental, so such changes are allowed).
Not yet in this PR: we should allow the sinks to take user-defined types
for mapping. We should also provide a convenience method for dynamic output
using the DefaultFilenamePolicy (e.g. passing in KVs).
R: @jkff
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/reuvenlax/incubator-beam
dynamic_file_based_sink
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/beam/pull/3356.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3356
----
commit 531a013f6f1e6865566f648d3486d77a7de20369
Author: Reuven Lax <[email protected]>
Date: 2017-06-10T00:11:32Z
Add DynamicDestinations support to FileBasedSink
commit 616f9ede5afa7e771f50595bcd290b907744d57f
Author: Reuven Lax <[email protected]>
Date: 2017-06-10T18:23:32Z
More fixups.
commit c6855fa130c86c94de74c36976a667de7f8d6219
Author: Reuven Lax <[email protected]>
Date: 2017-06-12T16:42:26Z
Fix some tests.
commit f4d4bf5f9238f7e9627363daa14158cdcc68d64b
Author: Reuven Lax <[email protected]>
Date: 2017-06-13T21:34:08Z
Remove baseDirectory parameter from FilenamePolicy. The FilenamePolicy can
choose it's own base directory.
commit 219a3d649428f673bf35f043990d61000d5fd64e
Author: Reuven Lax <[email protected]>
Date: 2017-06-14T00:06:01Z
No longer pass in "extension" to FilenamePolicy. Instead we pass in file
metadata class, including a getSuggestedExtension method that is based on the
compression type.
commit 1c5ed9e39e891b4418f597d531f442806b4f22b8
Author: Reuven Lax <[email protected]>
Date: 2017-06-14T00:41:22Z
Add a withTempDirectory override to TextIO and AvroIO. This way users
aren't forced to provide a dummy file prefix, just to specify a temp directory.
commit 14bf19544801ce0791ffd2f95574ba9b5b9d33c6
Author: Reuven Lax <[email protected]>
Date: 2017-06-14T00:49:34Z
Fix validation code.
commit 93727530ac83e70570f6da1214fa20c5871b97e6
Author: Reuven Lax <[email protected]>
Date: 2017-06-14T00:52:08Z
Fix fix javadoc.
commit da6b0e648fac9d9def26c37aa755b73a5a3f9cef
Author: Reuven Lax <[email protected]>
Date: 2017-06-14T02:29:00Z
Fix CheckStyle violations.
commit d2624f729cb7f7dc49f9dda65f6db7744f21e3c6
Author: Reuven Lax <[email protected]>
Date: 2017-06-14T04:01:26Z
Fix some failures.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---