Is it possible that you were getting a cache hit with the BQ operator?

https://cloud.google.com/bigquery/docs/cached-results#bigquery-query-cache-api

The operator does not currently expose this flag, and I couldn't find
whether the cache defaults to on or off for insert-job API.

On Wed, Sep 27, 2017 at 9:41 AM, Tobias Feldhaus <
[email protected]> wrote:

> I’ve created a table with only the missing value in the exact same
> partition, and then it’s going through. Could it be that the volume of the
> data plays a role or the client libraries maybe?
>
> On 27.09.2017, 17:46, "Tobias Feldhaus" <[email protected]>
> wrote:
>
>     Hi,
>
>
>     I am tracing a bug in one of our data pipelines and I narrowed it down
> to some small number of events not being in a table (using Airflow 1.8.2).
>     After running the query myself that airflow executed interactively, I
> saw the missing entry. When airflow executed the same query, and writes the
> results to a partitioned table in BQ it was missing in that destination
> table.
>     I’ve tried different scenarios now several times and the only
> explanation or difference I can come up with, is that airflow _might_ be
> that using partitioned tables is not fully supported or there is some weird
> bug in the bigquery-python implementation.
>
>     When deleting the table and recreating it and reloading the complete
> date with airflow the data is still missing. When reloading a single day,
> it is also missing. I’ve created a python script to execute the exact same
> query and it works as expected.
>
>     Any advice how to track this down further? Is this a known issue?
>
>     Best,
>     Tobias
>
>
>
>
>

Reply via email to