cheddar commented on issue #12022:
URL: https://github.com/apache/druid/issues/12022#issuecomment-985985486


   > > Workaround (Rejected):
   > > To retain compatibilty, we could add a new default long 
aggregateAndEstimateMemory() method and leave the existing aggregate() method 
as is. The default implementation would return a negative value (say -1) in 
which case the OnHeapIncrementalIndex or any other estimator would use the max 
estimated size for that invocation.
   > 
   > This approach actually seems like a good idea to me for a few reasons:
   > 
   >     * Aggregation is hot code, and there will be some overhead to size 
estimation, and query-time callers don't necessarily need it. So having two 
methods allows the caller to decide if it actually needs size estimates or not.
   > 
   >     * We want to avoid incompatible changes to aggregation interfaces 
whenever possible.
   
   While this proposal does introduce a pretty big backwards incompatibility 
and I hate introducing such compatibilities, doing this in the proposed 
backwards compatible way adds a branch after every single call to the 
aggregator which, given that this is a hot method, could have significant 
inadvertent impacts to query performance.  Having the method be guaranteed to 
return a value that is always added minimizes that impact.
   
   That said, with just the proposed interface, the conditional still moves 
into the implementation of the Aggregator itself for simple aggregators like a 
sum aggregator where it's a fixed memory size from initialization as the first 
call to `aggregate()` doubles as the "initialization" call.
   
   Thinking about this gave me another idea for how to minimize the 
incompatibility:
   
   We could add a method `getInitializedMemorySize()` which returns how much 
memory is consumed by just being initialized.  For static-sized aggregators, 
this would essentially equate to returning the current size estimation from 
this one method and then always returning 0 from the `aggregate()` call.  
   
   Alternatively, instead of adding a method to the Aggregator interface, we 
could give `AggregatorFactory` an `AggregatorFactory.factorizeWithSize()` which 
returns a "holder" object that has both a reference to the `Aggregator` and its 
size.  Given that `AggregatorFactory` is the thing that knows the intermediate 
size right now, this approach would actually allow for a default implementation 
that pushes the current size into the return object.  If we then create a 
``Aggregator.aggregateWithSize` method, it could default to an implementation 
that calls `aggregate` and returns 0.  Which now actually does make this a 
completely backwards compatible change.  My only worry with that is that the 
most common implementation of `Aggregator.aggregate` would likely be via that 
default implementation and I'm not sure if this would be adding yet another 
virtual function call to the hot path.  
   
   The least-risk of impact approach is probably to add the 
`factorizeWithSize()` and still give `aggregate` a return type.  This is 
backwards incompatible, but the vast majority of implementations just need to 
add a `return 0;` so, at least the change required is minimized.
   
   Perhaps we try the fully backwards compatible approach, benchmark it.  
Adjust to the incompatible, non-default method, benchmark that and see if 
there's a marked difference between them?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to