Sounds like a possible solution, however to avoid hitting this problem I’ve 
deleted all the tables before rerunning stuff. I think it might have to do with 
the library. Airflow uses google-api-python-client which is in maintenance mode 
and Google suggests switching to google-cloud-python. I will write a 
PythonOperator DAG tomorrow and will check DAG against DAG then to see if the 
library could be the problem.

On 27.09.2017, 19:15, "Chris Riccomini" <[email protected]> wrote:

    Is it possible that you were getting a cache hit with the BQ operator?
    
    
https://cloud.google.com/bigquery/docs/cached-results#bigquery-query-cache-api
    
    The operator does not currently expose this flag, and I couldn't find
    whether the cache defaults to on or off for insert-job API.
    
    On Wed, Sep 27, 2017 at 9:41 AM, Tobias Feldhaus <
    [email protected]> wrote:
    
    > I’ve created a table with only the missing value in the exact same
    > partition, and then it’s going through. Could it be that the volume of the
    > data plays a role or the client libraries maybe?
    >
    > On 27.09.2017, 17:46, "Tobias Feldhaus" <[email protected]>
    > wrote:
    >
    >     Hi,
    >
    >
    >     I am tracing a bug in one of our data pipelines and I narrowed it down
    > to some small number of events not being in a table (using Airflow 1.8.2).
    >     After running the query myself that airflow executed interactively, I
    > saw the missing entry. When airflow executed the same query, and writes 
the
    > results to a partitioned table in BQ it was missing in that destination
    > table.
    >     I’ve tried different scenarios now several times and the only
    > explanation or difference I can come up with, is that airflow _might_ be
    > that using partitioned tables is not fully supported or there is some 
weird
    > bug in the bigquery-python implementation.
    >
    >     When deleting the table and recreating it and reloading the complete
    > date with airflow the data is still missing. When reloading a single day,
    > it is also missing. I’ve created a python script to execute the exact same
    > query and it works as expected.
    >
    >     Any advice how to track this down further? Is this a known issue?
    >
    >     Best,
    >     Tobias
    >
    >
    >
    >
    >
    

Reply via email to