Thanks, Reuven, that makes sense for step 1. After sending my original message, I started down the path of BoundedSource, but I think this could be better.
Do you know any trick for step 3? On Wed, Sep 27, 2017 at 8:58 AM, Reuven Lax <[email protected]> wrote: > A common pattern is the following > > p.apply(Create.of((Void) null)) > .apply(MapElements.via((Void v) -> /* once operation */); > > Of course as is always the case with any Beam DoFn, your operation might be > executed multiple times (e.g. if something fails before the runner commits > the fact that the operation has succeeded). You need to ensure that the > operation is idempotent. > > Reuven > > On Wed, Sep 27, 2017 at 8:51 AM, Jacob Marble <[email protected]> wrote: > > > I have been thinking on a Redshift reader/writer, basically to wrap > UNLOAD > > and COPY in a PTransform. For example, steps to UNLOAD into a > PCollection: > > > > 1) JDBC to Redshift - UNLOAD > > <http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html> TO > > 's3://bucket/tmp-prefix' > > 2) S3 to PCollection - work in progress <https://github.com/Kochava/ > > beam-s3> > > 3) delete tmp files from S3 > > > > To implement steps 1 and 3, I can't see a way to perform a task exactly > > once, globally, in a PTransform. Sure, I could do those steps in main() > or > > even in a separate script, but the result isn't code that can be shared > and > > reused very well. > > > > Am I missing something? Seems like the kind of problem that I shouldn't > be > > the first to encounter. > > > > Thanks, > > > > Jacob > > > -- Jacob
