adriangb commented on PR #20037:
URL: https://github.com/apache/datafusion/pull/20037#issuecomment-3811608973

   > From the perspective of user impacts, I see no complain. I do not remember 
seeing plan ser/de in any heat map as a problem
   
   It's not that the serde itself is slow (although this PR does improve it 
considerably). It's that the plan looses essential information for it to 
perform correctly / not allocate extra memory. Thus when you execute it the 
execution will be orders of magnitude slower. I can give you concrete examples 
of queries impacted: `select max(col) from t`, `select ... from t order by col 
limit 10`, `select ... from t1 join t2 on t1.pk = t2.t1_pk` and many others.
   
   I think you may not see complaints about it because (1) it's very hard to 
detect for users that something never got 1000x faster for them (i.e. they 
don't see the lack of improvement from upgrading from 51.0 to 52.0, it's not a 
regression for them), (2) the speedups that this cancels out are relatively new 
(51.0 and 52.0) and (3) in cases like the duplication of InList expressions 
(which has been an issue for a long time) while the extra memory consumption is 
theoretically unlimited real world in a system it's probably only a couple % on 
average; I'm sure users would love to get 5% less memory consumption but it's 
also very hard to detect that there's an extra 5% memory consumption being left 
on the table, especially if you have nothing to compare to.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to