Thomas Haederle created AIRFLOW-3185:
----------------------------------------

             Summary: Add chunking to DBAPI_hook by implementing fetchmany and 
pandas chunksize
                 Key: AIRFLOW-3185
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3185
             Project: Apache Airflow
          Issue Type: Improvement
          Components: hooks
    Affects Versions: 1.10.0
            Reporter: Thomas Haederle
            Assignee: Thomas Haederle


DbApiHook currently implements get_records and get_pandas_df, where both 
methods fetch all records into memory.

We should implement two new methods which return a generator with a 
configurable chunksize:

- def get_many_records(self, sql, parameters=None, chunksize=20, 
iterate_singles=False):
- def get_pandas_df_chunks(self, sql, parameters=None, chunksize=20)

this should work for all DB hooks which inherit from this class.

We could also adapt existing methods, but that could be problematic because 
these methods will return a generator whereas the others return either records 
or dataframes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to