Create is essentially a BoundedSource under the covers.

There are multiple ways to handle step 3. One is to produce a
PCollection<String> containing the filenames. You could then attach a Void
key (using WithKeys), GBK the filenames together and delete in the next
step.

Reuven

On Wed, Sep 27, 2017 at 9:04 AM, Jacob Marble <[email protected]> wrote:

> Thanks, Reuven, that makes sense for step 1. After sending my original
> message, I started down the path of BoundedSource, but I think this could
> be better.
>
> Do you know any trick for step 3?
>
> On Wed, Sep 27, 2017 at 8:58 AM, Reuven Lax <[email protected]>
> wrote:
>
> > A common pattern is the following
> >
> > p.apply(Create.of((Void) null))
> >   .apply(MapElements.via((Void v) -> /* once operation */);
> >
> > Of course as is always the case with any Beam DoFn, your operation might
> be
> > executed multiple times (e.g. if something fails before the runner
> commits
> > the fact that the operation has succeeded). You need to ensure that the
> > operation is idempotent.
> >
> > Reuven
> >
> > On Wed, Sep 27, 2017 at 8:51 AM, Jacob Marble <[email protected]>
> wrote:
> >
> > > I have been thinking on a Redshift reader/writer, basically to wrap
> > UNLOAD
> > > and COPY in a PTransform. For example, steps to UNLOAD into a
> > PCollection:
> > >
> > > 1) JDBC to Redshift - UNLOAD
> > > <http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html> TO
> > > 's3://bucket/tmp-prefix'
> > > 2) S3 to PCollection - work in progress <https://github.com/Kochava/
> > > beam-s3>
> > > 3) delete tmp files from S3
> > >
> > > To implement steps 1 and 3, I can't see a way to perform a task exactly
> > > once, globally, in a PTransform. Sure, I could do those steps in main()
> > or
> > > even in a separate script, but the result isn't code that can be shared
> > and
> > > reused very well.
> > >
> > > Am I missing something? Seems like the kind of problem that I shouldn't
> > be
> > > the first to encounter.
> > >
> > > Thanks,
> > >
> > > Jacob
> > >
> >
>
>
>
> --
> Jacob
>

Reply via email to