henry3260 commented on PR #61144:
URL: https://github.com/apache/airflow/pull/61144#issuecomment-3817932385

   > > I have a small concern regarding performance here. Iterating through 
rows and calling .read() on each LOB object might trigger N+1 network 
round-trips to the database .
   > > Have we considered using outputtypehandler to avoid this?
   > 
   > Thanks for raising this — the concern is valid for sure, and a good point.
   > 
   > Calling .read() on `LOBs `can indeed trigger additional round-trips, 
depending on driver configuration and LOB size. However, when (C)LOB columns 
are selected and the result is returned from `get_first `/ `get_records`, the 
LOB contents must be fully materialized anyway in order to be XCom-serializable.
   > 
   > Using an outputtypehandler would shift when the LOB is read (during fetch 
rather than post-processing), but it would not avoid the underlying cost of 
transferring and materializing the LOB data. If you don't want to materialize 
those, you could always use the run method as there no handler is specified by 
default.
   > 
   > Since these methods (e.g. `get_records `and `get_first`) are used by 
operators that return results (e.g. GenericTransfer, SQLExecuteQueryOperator), 
returning raw `Oracle LOB` objects is not a viable option. This PR ensures 
correctness by guaranteeing that only serializable Python types are returned.
   > 
   > That said, I’m open, but I personally think that should be done in a 
separate PR anyway if we would go that way.
   
   Thanks for the explanation! I agree that materialization is necessary for 
XCom serialization regardless.
   
   Just a small clarification: my concern regarding N+1 was about network 
latency (round-trips) rather than data volume. Using outputtypehandler allows 
the driver to prefetch LOB data within the same fetch round-trips, whereas 
explicit .read() calls usually force separate network packets for each row.
   
   However, I fully agree that this optimization is out of scope for this PR, 
as the priority here is correctness and fixing the serialization crash. Let's 
merge this to fix the immediate bug, and we can look into optimizing get_conn 
with outputtypehandler in a future PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to