Hello,

Here I'm again with another proposal, which shouldn't be that hard to evaluate 
and is also related to the work I did regarding the dialects and performance 
enhancements in the common sql provider but also the PR regarding the deferred 
pagination in the GenericTransfer operator which I'm finishing as we speak and 
I also mentioned in the 
medium<https://medium.com/apache-airflow/transfering-data-from-sap-hana-to-mssql-using-the-airflow-generictransfer-d29f147a9f1f>
 article I wrote about it.

At our company we are using a custom SQLInsertRowsOperator, which allows us to 
persist XCom's directly without the need to write a custom Python code, so it's 
again like some kind of facilitator on top of the DbApiHook.
Hence why the work with the dialects and other related PR's were so important 
to be able to implement it in a correct way, meaning as less as possible logic 
within the operator so that all logic can be handled within the hook and both 
options can be used the same way.

So my question is if that operator would be accepted?  The code would be 
minimal, it could be added beside the other SQL operators within the common sql 
provider.
It's similar to the GenericTransfer operator, except it doesn't read data from 
another database, it uses an XCom as input for the rows to be persisted by the 
insert_rows method of the DbAPiHook.
It also offers some handy callbacks parameters to process the rows which has to 
be persisted.

As I already explained before, at our company we try to strive to have a less 
as possible custom python code within our DAG's, and use as much as possible 
existing Airflow operators, which make maintenance of DAG's easier.
This one would allow Airflow users to easily persist XCom's without the need to 
write Python code.

Below an example on how it could be used:

persist_records_task = SQLInsertRowsOperator(
    task_id="persist_records",
    conn_id=conn_id",
    schema="schema",
    table_name="table_name",
    insert_args={
        "commit_every": 5000,
        "replace": True,
        "executemany": True,
        "fast_executemany": True,
    },
    rows=csv_to_records_tasks.output,
)

What do you think about this proposal?
[cid:image001.png@01DB6771.2626E450]
David Blain
Data Engineer at ICT-514 - BI End User Reporting


Reply via email to