[
https://issues.apache.org/jira/browse/AIRFLOW-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645607#comment-16645607
]
Thomas Haederle commented on AIRFLOW-3185:
------------------------------------------
Draft code for the records method:
{noformat}
{noformat}
{code:java}
def get_many_records(self, sql, parameters=None, chunksize=20,
iterate_singles=False):
"""
Executes the sql and returns a an generator for a set of records.
:param sql: the sql statement to be executed (str) or a list of
sql statements to execute
:type sql: str or list
:param parameters: The parameters to render the SQL query with.
:type parameters: mapping or iterable
:param chunksize: The number of records to fetch from the server with each
roundtrip.
:type chunksize: int
:param iterate_singles: if the function yields one record at a time or sets of
chunksize
:type iterate_singles: bool
"""
if sys.version_info[0] < 3:
sql = sql.encode('utf-8')
with closing(self.get_conn()) as conn:
with closing(conn.cursor()) as cur:
if parameters is not None:
cur.execute(sql, parameters)
else:
cur.execute(sql)
while True:
#import pdb; pdb.set_trace()
results = cur.fetchmany(chunksize)
if not results: break
if iterate_singles:
for result in results:
yield result
else:
yield results
{code}
> Add chunking to DBAPI_hook by implementing fetchmany and pandas chunksize
> -------------------------------------------------------------------------
>
> Key: AIRFLOW-3185
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3185
> Project: Apache Airflow
> Issue Type: Improvement
> Components: hooks
> Affects Versions: 1.10.0
> Reporter: Thomas Haederle
> Assignee: Thomas Haederle
> Priority: Minor
> Labels: easyfix
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> DbApiHook currently implements get_records and get_pandas_df, where both
> methods fetch all records into memory.
> We should implement two new methods which return a generator with a
> configurable chunksize:
> - def get_many_records(self, sql, parameters=None, chunksize=20,
> iterate_singles=False):
> - def get_pandas_df_chunks(self, sql, parameters=None, chunksize=20)
> this should work for all DB hooks which inherit from this class.
> We could also adapt existing methods, but that could be problematic because
> these methods will return a generator whereas the others return either
> records or dataframes.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)