Is it possible that you were getting a cache hit with the BQ operator? https://cloud.google.com/bigquery/docs/cached-results#bigquery-query-cache-api
The operator does not currently expose this flag, and I couldn't find whether the cache defaults to on or off for insert-job API. On Wed, Sep 27, 2017 at 9:41 AM, Tobias Feldhaus < [email protected]> wrote: > I’ve created a table with only the missing value in the exact same > partition, and then it’s going through. Could it be that the volume of the > data plays a role or the client libraries maybe? > > On 27.09.2017, 17:46, "Tobias Feldhaus" <[email protected]> > wrote: > > Hi, > > > I am tracing a bug in one of our data pipelines and I narrowed it down > to some small number of events not being in a table (using Airflow 1.8.2). > After running the query myself that airflow executed interactively, I > saw the missing entry. When airflow executed the same query, and writes the > results to a partitioned table in BQ it was missing in that destination > table. > I’ve tried different scenarios now several times and the only > explanation or difference I can come up with, is that airflow _might_ be > that using partitioned tables is not fully supported or there is some weird > bug in the bigquery-python implementation. > > When deleting the table and recreating it and reloading the complete > date with airflow the data is still missing. When reloading a single day, > it is also missing. I’ve created a python script to execute the exact same > query and it works as expected. > > Any advice how to track this down further? Is this a known issue? > > Best, > Tobias > > > > >
