adriangb commented on PR #20037: URL: https://github.com/apache/datafusion/pull/20037#issuecomment-3811608973
> From the perspective of user impacts, I see no complain. I do not remember seeing plan ser/de in any heat map as a problem It's not that the serde itself is slow (although this PR does improve it considerably). It's that the plan looses essential information for it to perform correctly / not allocate extra memory. Thus when you execute it the execution will be orders of magnitude slower. I can give you concrete examples of queries impacted: `select max(col) from t`, `select ... from t order by col limit 10`, `select ... from t1 join t2 on t1.pk = t2.t1_pk` and many others. I think you may not see complaints about it because (1) it's very hard to detect for users that something never got 1000x faster for them (i.e. they don't see the lack of improvement from upgrading from 51.0 to 52.0, it's not a regression for them), (2) the speedups that this cancels out are relatively new (51.0 and 52.0) and (3) in cases like the duplication of InList expressions (which has been an issue for a long time) while the extra memory consumption is theoretically unlimited real world in a system it's probably only a couple % on average; I'm sure users would love to get 5% less memory consumption but it's also very hard to detect that there's an extra 5% memory consumption being left on the table, especially if you have nothing to compare to. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
