There are a couple of options, and if you provide a job id (since you are using the Dataflow runner) we can better advise.
If you are writing to more than 2000 partitions, this won't work - BigQuery has a hard quota of 1000 partition updates per table per day. If you have fewer than 1000 jobs, there are a few possibilities. It's possible that BigQuery is taking a while to schedule some of those jobs; they'll end up sitting in a queue waiting to be scheduled. We can look at one of the jobs in detail to see if that's happening. Eugene's suggestion of using your pipeline to load into a single table might be the best one. You can write the date into a separate column, and then write a shell script to copy each date to it's own partition (see https://cloud.google.com/bigquery/docs/creating-partitioned-tables#update-with-query for some examples). On Wed, Sep 27, 2017 at 11:39 AM, Eugene Kirpichov < [email protected]> wrote: > I see. Then Reuven's answer above applies. > Maybe you could write to a non-partitioned table, and then split it into > smaller partitioned tables. See https://stackoverflow.com/a/ > 39001706/278042 > <https://stackoverflow.com/a/39001706/278042ащк> for a discussion of the > current options - granted, it seems like there currently don't exist very > good options for creating a very large number of table partitions from > existing data. > > On Wed, Sep 27, 2017 at 4:01 AM Chaim Turkel <[email protected]> wrote: > > > thank you for your detailed response. > > Currently i am a bit stuck. > > I need to migrate data from mongo to bigquery, we have about 1 terra > > of data. It is history data, so i want to use bigquery partitions. > > It seems that the io connector creates a job per partition so it takes > > a very long time, and i hit the quota in bigquery of the amount of > > jobs per day. > > I would like to use streaming but you cannot stream old data more than 30 > > day > > > > So I thought of partitions to see if i can do more parraleism > > > > chaim > > > > > > On Wed, Sep 27, 2017 at 9:49 AM, Eugene Kirpichov > > <[email protected]> wrote: > > > Okay, I see - there's about 3 different meanings of the word > "partition" > > > that could have been involved here (BigQuery partitions, > runner-specific > > > bundles, and the Partition transform), hence my request for > > clarification. > > > > > > If you mean the Partition transform - then I'm confused what do you > mean > > by > > > BigQueryIO "supporting" it? The Partition transform takes a PCollection > > and > > > produces a bunch of PCollections; these are ordinary PCollection's and > > you > > > can apply any Beam transforms to them, and BigQueryIO.write() is no > > > exception to this - you can apply it too. > > > > > > To answer whether using Partition would improve your performance, I'd > > need > > > to understand exactly what you're comparing against what. I suppose > > you're > > > comparing the following: > > > 1) Applying BigQueryIO.write() to a PCollection, writing to a single > > table > > > 2) Splitting a PCollection into several smaller PCollection's using > > > Partition, and applying BigQueryIO.write() to each of them, writing to > > > different tables I suppose? (or do you want to write to different > > BigQuery > > > partitions of the same table using a table partition decorator?) > > > I would expect #2 to perform strictly worse than #1, because it writes > > the > > > same amount of data but increases the number of BigQuery load jobs > > involved > > > (thus increases per-job overhead and consumes BigQuery quota). > > > > > > On Tue, Sep 26, 2017 at 11:35 PM Chaim Turkel <[email protected]> > wrote: > > > > > >> https://beam.apache.org/documentation/programming-guide/#partition > > >> > > >> On Tue, Sep 26, 2017 at 6:42 PM, Eugene Kirpichov > > >> <[email protected]> wrote: > > >> > What do you mean by Beam partitions? > > >> > > > >> > On Tue, Sep 26, 2017, 6:57 AM Chaim Turkel <[email protected]> > wrote: > > >> > > > >> >> by the way currently the performance on bigquery partitions is very > > bad. > > >> >> Is there a repository where i can test with 2.2.0? > > >> >> > > >> >> chaim > > >> >> > > >> >> On Tue, Sep 26, 2017 at 4:52 PM, Reuven Lax > <[email protected] > > > > > >> >> wrote: > > >> >> > Do you mean BigQuery partitions? Yes, however 2.1.0 has a bug if > > the > > >> >> table > > >> >> > containing the partitions is not pre created (fixed in 2.2.0). > > >> >> > > > >> >> > On Tue, Sep 26, 2017 at 6:40 AM, Chaim Turkel <[email protected]> > > >> wrote: > > >> >> > > > >> >> >> Hi, > > >> >> >> > > >> >> >> Does BigQueryIO support Partitions when writing? will it > > improve > > >> my > > >> >> >> performance? > > >> >> >> > > >> >> >> > > >> >> >> chaim > > >> >> >> > > >> >> > > >> > > >
