Thanks, Reuven, that makes sense for step 1. After sending my original
message, I started down the path of BoundedSource, but I think this could
be better.

Do you know any trick for step 3?

On Wed, Sep 27, 2017 at 8:58 AM, Reuven Lax <[email protected]>
wrote:

> A common pattern is the following
>
> p.apply(Create.of((Void) null))
>   .apply(MapElements.via((Void v) -> /* once operation */);
>
> Of course as is always the case with any Beam DoFn, your operation might be
> executed multiple times (e.g. if something fails before the runner commits
> the fact that the operation has succeeded). You need to ensure that the
> operation is idempotent.
>
> Reuven
>
> On Wed, Sep 27, 2017 at 8:51 AM, Jacob Marble <[email protected]> wrote:
>
> > I have been thinking on a Redshift reader/writer, basically to wrap
> UNLOAD
> > and COPY in a PTransform. For example, steps to UNLOAD into a
> PCollection:
> >
> > 1) JDBC to Redshift - UNLOAD
> > <http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html> TO
> > 's3://bucket/tmp-prefix'
> > 2) S3 to PCollection - work in progress <https://github.com/Kochava/
> > beam-s3>
> > 3) delete tmp files from S3
> >
> > To implement steps 1 and 3, I can't see a way to perform a task exactly
> > once, globally, in a PTransform. Sure, I could do those steps in main()
> or
> > even in a separate script, but the result isn't code that can be shared
> and
> > reused very well.
> >
> > Am I missing something? Seems like the kind of problem that I shouldn't
> be
> > the first to encounter.
> >
> > Thanks,
> >
> > Jacob
> >
>



-- 
Jacob

Reply via email to