Thomas Haederle created AIRFLOW-3185:
----------------------------------------
Summary: Add chunking to DBAPI_hook by implementing fetchmany and
pandas chunksize
Key: AIRFLOW-3185
URL: https://issues.apache.org/jira/browse/AIRFLOW-3185
Project: Apache Airflow
Issue Type: Improvement
Components: hooks
Affects Versions: 1.10.0
Reporter: Thomas Haederle
Assignee: Thomas Haederle
DbApiHook currently implements get_records and get_pandas_df, where both
methods fetch all records into memory.
We should implement two new methods which return a generator with a
configurable chunksize:
- def get_many_records(self, sql, parameters=None, chunksize=20,
iterate_singles=False):
- def get_pandas_df_chunks(self, sql, parameters=None, chunksize=20)
this should work for all DB hooks which inherit from this class.
We could also adapt existing methods, but that could be problematic because
these methods will return a generator whereas the others return either records
or dataframes.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)