Hi,

Sorry for the delay. So sounds like you want to do something after writing
a window of data to BigQuery is complete.
I think this should be possible: expansion of BigQueryIO.write() returns a
WriteResult and you can apply other transforms to it. Have you tried that?

On Sat, Aug 26, 2017 at 1:10 PM Chaim Turkel <[email protected]> wrote:

> I have documents from a mongo db that i need to migrate to bigquery.
> Since it is mongodb i do not know they schema ahead of time, so i have
> two pipelines, one to run over the documents and update the bigquery
> schema, then wait a few minutes (i can take for bigquery to be able to
> use the new schema) then with the other pipline copy all the
> documents.
> To know as to where i got with the different piplines i have a status
> table so that at the start i know from where to continue.
> So i need the option to update the status table with the success of
> the copy and some time value of the last copied document
>
>
> chaim
>
> On Fri, Aug 25, 2017 at 6:53 PM, Eugene Kirpichov
> <[email protected]> wrote:
> > I'd like to know more about your both use cases, can you clarify? I think
> > making sinks output something that can be waited on by another pipeline
> > step is a reasonable request, but more details would help refine this
> > suggestion.
> >
> > On Fri, Aug 25, 2017, 8:46 AM Chamikara Jayalath <[email protected]>
> > wrote:
> >
> >> Can you do this from the program that runs the Beam job, after job is
> >> complete (you might have to use a blocking runner or poll for the
> status of
> >> the job) ?
> >>
> >> - Cham
> >>
> >> On Fri, Aug 25, 2017 at 8:44 AM Steve Niemitz <[email protected]>
> wrote:
> >>
> >> > I also have a similar use case (but with BigTable) that I feel like I
> had
> >> > to hack up to make work.  It'd be great to hear if there is a way to
> do
> >> > something like this already, or if there are plans in the future.
> >> >
> >> > On Fri, Aug 25, 2017 at 9:46 AM, Chaim Turkel <[email protected]>
> wrote:
> >> >
> >> > > Hi,
> >> > >   I have a few piplines that are an ETL from different systems to
> >> > bigquery.
> >> > > I would like to write the status of the ETL after all records have
> >> > > been updated to the bigquery.
> >> > > The problem is that writing to bigquery is a sink and you cannot
> have
> >> > > any other steps after the sink.
> >> > > I tried a sideoutput, but this is called in no correlation to the
> >> > > writing to bigquery, so i don't know if it succeeded or failed.
> >> > >
> >> > >
> >> > > any ideas?
> >> > > chaim
> >> > >
> >> >
> >>
>

Reply via email to