i am using batch, since streaming cannot be done with partitions with old data more than 30 days. the question is how can i catch the exception in the pipline so that other collections do not fail
On Fri, Sep 15, 2017 at 7:37 PM, Eugene Kirpichov <[email protected]> wrote: > Are you using streaming inserts or batch loads method for writing? > If it's streaming inserts, BigQueryIO already can return the bad records, > and I believe it won't fail the pipeline, so I'm assuming it's batch loads. > For batch loads, would it be sufficient for your purposes if > BigQueryIO.read() let you configure the configuration.load.maxBadRecords > parameter (see https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs > )? > > On Thu, Sep 14, 2017 at 10:29 PM Chaim Turkel <[email protected]> wrote: > >> I am using the sink of BigQueryIO so the example is not the same. The >> example is bad data from reading, I have problems when writting. There >> can be multiple errors when writing to BigQuery, and if it fails there >> is no way to catch this error, and the whole pipeline fails >> >> chaim >> >> On Thu, Sep 14, 2017 at 5:48 PM, Reuven Lax <[email protected]> >> wrote: >> > What sort of error? You can always put a try/catch inside your DoFns to >> > catch the majority of errors. A common pattern is to save records that >> > caused exceptions out to a separate output so you can debug them. This >> blog >> > post >> > < >> https://cloud.google.com/blog/big-data/2016/01/handling-invalid-inputs-in-dataflow >> > >> > explained >> > the pattern. >> > >> > Reuven >> > >> > On Thu, Sep 14, 2017 at 1:43 AM, Chaim Turkel <[email protected]> wrote: >> > >> >> Hi, >> >> >> >> In one pipeline I have multiple PCollections. If I have an error on >> >> one then the whole pipline is canceled, is there a way to catch the >> >> error and log it, and for all other PCollections to continue? >> >> >> >> >> >> chaim >> >> >>
