A common pattern is the following

p.apply(Create.of((Void) null))
  .apply(MapElements.via((Void v) -> /* once operation */);

Of course as is always the case with any Beam DoFn, your operation might be
executed multiple times (e.g. if something fails before the runner commits
the fact that the operation has succeeded). You need to ensure that the
operation is idempotent.

Reuven

On Wed, Sep 27, 2017 at 8:51 AM, Jacob Marble <[email protected]> wrote:

> I have been thinking on a Redshift reader/writer, basically to wrap UNLOAD
> and COPY in a PTransform. For example, steps to UNLOAD into a PCollection:
>
> 1) JDBC to Redshift - UNLOAD
> <http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html> TO
> 's3://bucket/tmp-prefix'
> 2) S3 to PCollection - work in progress <https://github.com/Kochava/
> beam-s3>
> 3) delete tmp files from S3
>
> To implement steps 1 and 3, I can't see a way to perform a task exactly
> once, globally, in a PTransform. Sure, I could do those steps in main() or
> even in a separate script, but the result isn't code that can be shared and
> reused very well.
>
> Am I missing something? Seems like the kind of problem that I shouldn't be
> the first to encounter.
>
> Thanks,
>
> Jacob
>

Reply via email to