henry3260 commented on PR #61144: URL: https://github.com/apache/airflow/pull/61144#issuecomment-3817932385
> > I have a small concern regarding performance here. Iterating through rows and calling .read() on each LOB object might trigger N+1 network round-trips to the database . > > Have we considered using outputtypehandler to avoid this? > > Thanks for raising this — the concern is valid for sure, and a good point. > > Calling .read() on `LOBs `can indeed trigger additional round-trips, depending on driver configuration and LOB size. However, when (C)LOB columns are selected and the result is returned from `get_first `/ `get_records`, the LOB contents must be fully materialized anyway in order to be XCom-serializable. > > Using an outputtypehandler would shift when the LOB is read (during fetch rather than post-processing), but it would not avoid the underlying cost of transferring and materializing the LOB data. If you don't want to materialize those, you could always use the run method as there no handler is specified by default. > > Since these methods (e.g. `get_records `and `get_first`) are used by operators that return results (e.g. GenericTransfer, SQLExecuteQueryOperator), returning raw `Oracle LOB` objects is not a viable option. This PR ensures correctness by guaranteeing that only serializable Python types are returned. > > That said, I’m open, but I personally think that should be done in a separate PR anyway if we would go that way. Thanks for the explanation! I agree that materialization is necessary for XCom serialization regardless. Just a small clarification: my concern regarding N+1 was about network latency (round-trips) rather than data volume. Using outputtypehandler allows the driver to prefetch LOB data within the same fetch round-trips, whereas explicit .read() calls usually force separate network packets for each row. However, I fully agree that this optimization is out of scope for this PR, as the priority here is correctness and fixing the serialization crash. Let's merge this to fix the immediate bug, and we can look into optimizing get_conn with outputtypehandler in a future PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
