Chaim,

Batch loads in BigQuery load the entire table in one operation, which means
that it doesn't make sense to try and catch it; the entire table load will
fail anyway.

Do you know what kind of errors you are getting? If there are malformed
entries, can you add a ParDo beforehand to clean the data?

Reuven

On Sat, Sep 16, 2017 at 11:07 AM, Chaim Turkel <[email protected]> wrote:

> i am using batch, since streaming cannot be done with partitions with
> old data more than 30 days.
> the question is how can i catch the exception in the pipline so that
> other collections do not fail
>
> On Fri, Sep 15, 2017 at 7:37 PM, Eugene Kirpichov
> <[email protected]> wrote:
> > Are you using streaming inserts or batch loads method for writing?
> > If it's streaming inserts, BigQueryIO already can return the bad records,
> > and I believe it won't fail the pipeline, so I'm assuming it's batch
> loads.
> > For batch loads, would it be sufficient for your purposes if
> > BigQueryIO.read() let you configure the configuration.load.maxBadRecords
> > parameter (see https://cloud.google.com/bigquery/docs/reference/rest/
> v2/jobs
> > )?
> >
> > On Thu, Sep 14, 2017 at 10:29 PM Chaim Turkel <[email protected]> wrote:
> >
> >> I am using the sink of BigQueryIO so the example is not the same. The
> >> example is bad data from reading, I have problems when writting. There
> >> can be multiple errors when writing to BigQuery, and if it fails there
> >> is no way to catch this error, and the whole pipeline fails
> >>
> >> chaim
> >>
> >> On Thu, Sep 14, 2017 at 5:48 PM, Reuven Lax <[email protected]>
> >> wrote:
> >> > What sort of error? You can always put a try/catch inside your DoFns
> to
> >> > catch the majority of errors. A common pattern is to save records that
> >> > caused exceptions out to a separate output so you can debug them. This
> >> blog
> >> > post
> >> > <
> >> https://cloud.google.com/blog/big-data/2016/01/handling-
> invalid-inputs-in-dataflow
> >> >
> >> > explained
> >> > the pattern.
> >> >
> >> > Reuven
> >> >
> >> > On Thu, Sep 14, 2017 at 1:43 AM, Chaim Turkel <[email protected]>
> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >>   In one pipeline I have multiple PCollections. If I have an error on
> >> >> one then the whole pipline is canceled, is there a way to catch the
> >> >> error and log it, and for all other PCollections to continue?
> >> >>
> >> >>
> >> >> chaim
> >> >>
> >>
>

Reply via email to