I am highly skeptical that it's the library. On Wed, Sep 27, 2017 at 1:50 PM, Tobias Feldhaus < [email protected]> wrote:
> This was exactly my point. Before I dig deeper I want to build a very > minimum PythonOperator that uses the new library as I am currently > comparing apples with oranges (same query, same data, different > libraries). Although it really puzzles me how a different library can yield > different (read as: some is missing) data – when it’s job is just to > execute a query and not pulling and transforming it. > > > On 27.09.2017, 19:43, "Chris Riccomini" <[email protected]> wrote: > > Interesting. Just saw: > > https://github.com/google/google-api-python-client > > > This client library is supported but in maintenance mode only. We are > fixing necessary bugs and adding essential features to ensure this > library > continues to meet your needs for accessing Google APIs. Non-critical > issues > will be closed. Any issue may be reopened if it is causing ongoing > problems. > > Looks like we might want to migrate at some point. It'll be a big > change. > <https://github.com/google/google-api-python-client#about> > > On Wed, Sep 27, 2017 at 10:41 AM, Chris Riccomini < > [email protected]> > wrote: > > > AFAIK, google-api-python-client is not in maintenance mode. In fact, > I > > believe the idiomatic Python library (google-cloud-python) is built > off of google-api-python-client, > > I believe. I have spoken with several Google cloud PMs who have > pointed me > > at google-api-python-client as the canonical library to use, and the > one > > that receives updates for new products first (before > google-cloud-python). > > > > On Wed, Sep 27, 2017 at 10:34 AM, Tobias Feldhaus < > > [email protected]> wrote: > > > >> Sounds like a possible solution, however to avoid hitting this > problem > >> I’ve deleted all the tables before rerunning stuff. I think it > might have > >> to do with the library. Airflow uses google-api-python-client which > is in > >> maintenance mode and Google suggests switching to > google-cloud-python. I > >> will write a PythonOperator DAG tomorrow and will check DAG against > DAG > >> then to see if the library could be the problem. > >> > >> On 27.09.2017, 19:15, "Chris Riccomini" <[email protected]> > wrote: > >> > >> Is it possible that you were getting a cache hit with the BQ > operator? > >> > >> https://cloud.google.com/bigquery/docs/cached-results#bigque > >> ry-query-cache-api > >> > >> The operator does not currently expose this flag, and I > couldn't find > >> whether the cache defaults to on or off for insert-job API. > >> > >> On Wed, Sep 27, 2017 at 9:41 AM, Tobias Feldhaus < > >> [email protected]> wrote: > >> > >> > I’ve created a table with only the missing value in the exact > same > >> > partition, and then it’s going through. Could it be that the > volume > >> of the > >> > data plays a role or the client libraries maybe? > >> > > >> > On 27.09.2017, 17:46, "Tobias Feldhaus" < > >> [email protected]> > >> > wrote: > >> > > >> > Hi, > >> > > >> > > >> > I am tracing a bug in one of our data pipelines and I > narrowed > >> it down > >> > to some small number of events not being in a table (using > Airflow > >> 1.8.2). > >> > After running the query myself that airflow executed > >> interactively, I > >> > saw the missing entry. When airflow executed the same query, > and > >> writes the > >> > results to a partitioned table in BQ it was missing in that > >> destination > >> > table. > >> > I’ve tried different scenarios now several times and the > only > >> > explanation or difference I can come up with, is that airflow > >> _might_ be > >> > that using partitioned tables is not fully supported or there > is > >> some weird > >> > bug in the bigquery-python implementation. > >> > > >> > When deleting the table and recreating it and reloading > the > >> complete > >> > date with airflow the data is still missing. When reloading a > >> single day, > >> > it is also missing. I’ve created a python script to execute > the > >> exact same > >> > query and it works as expected. > >> > > >> > Any advice how to track this down further? Is this a known > >> issue? > >> > > >> > Best, > >> > Tobias > >> > > >> > > >> > > >> > > >> > > >> > >> > >> > > > > >
