so what you are saying is that windowing is not supported on the
bigquery? this does not make sense since i am using it for the table
partition, and that works fine?

On Sat, Sep 9, 2017 at 7:40 PM, Steve Niemitz <[email protected]> wrote:
> I wonder if it makes sense to start simple and go from there.  For example,
> I enhanced BigtableIO.Write to output the number of rows written
> in finishBundle(), simply into the global window with the current
> timestamp.  This was more than enough to unblock me, but doesn't support
> more complicated scenarios with windowing.
>
> However, as I said it was more than enough to solve the general batch use
> case, and I imagine could be enhanced to support windowing by keeping track
> of which windows were written per bundle. (can there even ever be more than
> one window per bundle?)
>
> On Fri, Sep 8, 2017 at 2:32 PM, Eugene Kirpichov <
> [email protected]> wrote:
>
>> Hi,
>> I was going to implement this, but discussed it with +Reuven Lax
>> <[email protected]> and it appears to be quite difficult to do properly, or
>> even to define what it means at all, especially if you're using the
>> streaming inserts write method. So for now there is no workaround except
>> programmatically waiting for your whole pipeline to finish
>> (pipeline.run().waitUntilFinish()).
>>
>> On Fri, Sep 8, 2017 at 2:19 AM Chaim Turkel <[email protected]> wrote:
>>
>> > is there a way around this for now?
>> > how can i get a snapshot version?
>> >
>> > chaim
>> >
>> > On Tue, Sep 5, 2017 at 8:48 AM, Eugene Kirpichov
>> > <[email protected]> wrote:
>> > > Oh I see! Okay, this should be easy to fix. I'll take a look.
>> > >
>> > > On Mon, Sep 4, 2017 at 10:23 PM Chaim Turkel <[email protected]> wrote:
>> > >
>> > >> WriteResult does not support apply -> that is the problem
>> > >>
>> > >> On Tue, Sep 5, 2017 at 4:59 AM, Eugene Kirpichov
>> > >> <[email protected]> wrote:
>> > >> > Hi,
>> > >> >
>> > >> > Sorry for the delay. So sounds like you want to do something after
>> > >> writing
>> > >> > a window of data to BigQuery is complete.
>> > >> > I think this should be possible: expansion of BigQueryIO.write()
>> > returns
>> > >> a
>> > >> > WriteResult and you can apply other transforms to it. Have you tried
>> > >> that?
>> > >> >
>> > >> > On Sat, Aug 26, 2017 at 1:10 PM Chaim Turkel <[email protected]>
>> > wrote:
>> > >> >
>> > >> >> I have documents from a mongo db that i need to migrate to
>> bigquery.
>> > >> >> Since it is mongodb i do not know they schema ahead of time, so i
>> > have
>> > >> >> two pipelines, one to run over the documents and update the
>> bigquery
>> > >> >> schema, then wait a few minutes (i can take for bigquery to be able
>> > to
>> > >> >> use the new schema) then with the other pipline copy all the
>> > >> >> documents.
>> > >> >> To know as to where i got with the different piplines i have a
>> status
>> > >> >> table so that at the start i know from where to continue.
>> > >> >> So i need the option to update the status table with the success of
>> > >> >> the copy and some time value of the last copied document
>> > >> >>
>> > >> >>
>> > >> >> chaim
>> > >> >>
>> > >> >> On Fri, Aug 25, 2017 at 6:53 PM, Eugene Kirpichov
>> > >> >> <[email protected]> wrote:
>> > >> >> > I'd like to know more about your both use cases, can you
>> clarify? I
>> > >> think
>> > >> >> > making sinks output something that can be waited on by another
>> > >> pipeline
>> > >> >> > step is a reasonable request, but more details would help refine
>> > this
>> > >> >> > suggestion.
>> > >> >> >
>> > >> >> > On Fri, Aug 25, 2017, 8:46 AM Chamikara Jayalath <
>> > >> [email protected]>
>> > >> >> > wrote:
>> > >> >> >
>> > >> >> >> Can you do this from the program that runs the Beam job, after
>> > job is
>> > >> >> >> complete (you might have to use a blocking runner or poll for
>> the
>> > >> >> status of
>> > >> >> >> the job) ?
>> > >> >> >>
>> > >> >> >> - Cham
>> > >> >> >>
>> > >> >> >> On Fri, Aug 25, 2017 at 8:44 AM Steve Niemitz <
>> > [email protected]>
>> > >> >> wrote:
>> > >> >> >>
>> > >> >> >> > I also have a similar use case (but with BigTable) that I feel
>> > >> like I
>> > >> >> had
>> > >> >> >> > to hack up to make work.  It'd be great to hear if there is a
>> > way
>> > >> to
>> > >> >> do
>> > >> >> >> > something like this already, or if there are plans in the
>> > future.
>> > >> >> >> >
>> > >> >> >> > On Fri, Aug 25, 2017 at 9:46 AM, Chaim Turkel <
>> [email protected]
>> > >
>> > >> >> wrote:
>> > >> >> >> >
>> > >> >> >> > > Hi,
>> > >> >> >> > >   I have a few piplines that are an ETL from different
>> > systems to
>> > >> >> >> > bigquery.
>> > >> >> >> > > I would like to write the status of the ETL after all
>> records
>> > >> have
>> > >> >> >> > > been updated to the bigquery.
>> > >> >> >> > > The problem is that writing to bigquery is a sink and you
>> > cannot
>> > >> >> have
>> > >> >> >> > > any other steps after the sink.
>> > >> >> >> > > I tried a sideoutput, but this is called in no correlation
>> to
>> > the
>> > >> >> >> > > writing to bigquery, so i don't know if it succeeded or
>> > failed.
>> > >> >> >> > >
>> > >> >> >> > >
>> > >> >> >> > > any ideas?
>> > >> >> >> > > chaim
>> > >> >> >> > >
>> > >> >> >> >
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>>

Reply via email to