dabla commented on PR #61144: URL: https://github.com/apache/airflow/pull/61144#issuecomment-3813364657
> I have a small concern regarding performance here. Iterating through rows and calling .read() on each LOB object might trigger N+1 network round-trips to the database . > Have we considered using outputtypehandler to avoid this? Thanks for raising this — the concern is valid for sure, and a good point. Calling .read() on `LOBs `can indeed trigger additional round-trips, depending on driver configuration and LOB size. However, when (C)LOB columns are selected and the result is returned from `get_first `/ `get_records`, the LOB contents must be fully materialized anyway in order to be XCom-serializable. Using an outputtypehandler would shift when the LOB is read (during fetch rather than post-processing), but it would not avoid the underlying cost of transferring and materializing the LOB data. If you don't want to materialize those, you could always use the run method as there no handler is specified by default. Since these methods (e.g. `get_records `and `get_first`) are used by operators that return results (e.g. GenericTransfer, SQLExecuteQueryOperator), returning raw `Oracle LOB` objects is not a viable option. This PR ensures correctness by guaranteeing that only serializable Python types are returned. That said, I’m open, but I personally think that should be done in a separate PR anyway if we would go that way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
