geoffreyclaude commented on PR #19390:
URL: https://github.com/apache/datafusion/pull/19390#issuecomment-4555805154

   > > @adriangb Sorry for the late reply, I was off and somehow missed this 
notification. I agree that this change is pretty well tested and benchmarked 
now, so I'd be pretty surprised if it introduces major issues (we might still 
have some weird edge case of course, as usual.) What I'm a bit more concerned 
about is maintainability as it does introduce quite a bit of complex features. 
In particular, "Commit 7: Implement Direct Probe", while showing significant 
gains, is pretty tricky to understand well. I would never have come up with 
something like this without AI "helping" out. WDYT? Are the performance gains 
worth the additional future headaches when new work in this area is needed?
   > 
   > That is my concern as well :)
   > 
   > It's hard for me to judge what is complex because I've never seen it 
before / don't have a CS degree vs. is complex for anyone. I'm prepared to 
basically trust your judgment on it.
   > 
   > How about this test for each commit / optimization: if your team came and 
said "hey we are hitting an `InList` bug in prod, please go fix it" and you had 
never seen the code before, would you feel "this is complicated but the X% perf 
gain makes it worth it" or "my first instinct is to rip this out it's not worth 
it". Maybe we pick 1-2 lowest "bang for the buck" optimizations in here and 
stash those for later?
   > 
   > One thing we could do is merge as stacked PRs so that commits don't get 
squashed and it's easier to revert individual pieces. Ultimately if there were 
issues because this is self contained I think the initial answer would be to 
revert the problematic optimization.
   
   @adriangb Good idea, stacked PRs is probably the safest bet here. We don't 
want all commits to get squashed into a single one (is that the merge policy on 
`datafusion`?), impossible to revert out of selectively. With stacked PRs, each 
optimization gets its own dedicated PR/commit, easy to revert and git bisect in 
case of issue (manually or with coding agents.)
   The most complex commit I mentioned above does have some pretty significant 
gains, so it would be a shame to not take advantage of it! I'll work on 
stacking these properly (but maybe we can skip the in-depth review of each and 
focus on the big picture here.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to