clintropolis commented on pull request #11917:
URL: https://github.com/apache/druid/pull/11917#issuecomment-968442515


   Just to weigh in - despite raising this PR I'm actually in the camp that 
@gianm is in on this, else I probably would've done #11713 like this in the 
first place. I guess in my head I've been assuming whatever we eventually call 
Druid 1.0 is where I have been considering taking up more conservative stances 
on extension point changes (but maybe also 1.0 means nothing because people use 
Druid in production environments and have for years so isn't a real excuse).
   
   Assuming that those who implement custom aggs are at least recompiling their 
extension every release (which they really really should be doing), then the 
changes in #11713 seem like not a very big deal to me, because it would fail to 
compile and so be noticed immediately and has a very easy fix that shouldn't 
actually require extensive testing. This isn't changing the interface of the 
aggregator implementations themselves, which I think would be more involved for 
extension writers and require more care, just requiring factories to provide 
additional information.
   
   >On the argument of increasing support burden, that argument doesn't apply 
here as the "unknown complex" is needed anyway, you'll note that this change 
isn't introducing UNKNOWN_COMPLEX it is just using it in a default 
implementation.
   
   This isn't quite accurate, not implementing `getFinalizedColumnType` (as it 
has been renamed in this PR) is a loss of information when it has to fall back 
on `UNKNOWN_COMPLEX` compared to aggregators that properly implement the 
interface (`UNKNOWN_COMPLEX` was added in #11713 as a short term solution for 
serde compat stuff, and eventually should be removed). Prior to #11713 there 
was no possible way to get what actual complex type an aggregator would 
finalize to, since there was no `getFinalizedColumnTypeName` method (part of 
the work of #11713 was going through all existing aggregator implementations 
and filling in the correct type information). 
   
   The engine (and future changes I would like to do to it) would generally 
prefer to have the complete type information however, which is only available 
by actually implementing the new methods. I think it could be fairly argued 
that nothing is broken (yet) by falling back to unknown complex since we 
haven't heavily leaned into everything #11713 made possible (and since that PR 
introduced it it acknowledges that things are incomplete and stuff has to 
compensate for it). It does makes it harder to actually take advantage of this 
new information though, since it draws out the period that we still need to 
handle the unknown case. 
   
   
   On the subject of this PR only impacting a single release, I'm not entirely 
sure that is true either unless we just leave these functions named as they are 
renamed in this PR - `getColumnType` and `getFinalizedColumnType`. To get fully 
back into the state of #11713, e.g. with the methods named `getType` and 
`getFinalizedType`, I think it would take at least 3 releases to get done if we 
do it incrementally like this. 0.23 which would deprecate and add the new 
methods, 0.24 which could remove the old versions, and possibly make new 
versions of `getType` and `getFinalizedType` back to how they are in #11713 
with default implementations that call `getColumnType` and 
`getFinalizedColumnType` which would then be deprecated, and then finally in 
0.25 we can remove `getColumnType` and `getFinalizedColumnType`. This disrupts 
extension writers at least twice instead of once, which seems worse?
   
   I guess I obviously don't feel so strongly as to block this change either, 
lest I wouldn't have opened this PR, but I still don't personally feel like we 
gain much to make it strictly necessary either, and it definitely isn't without 
pain, so I'm at best a +0 on this (talked up from a -1). 
   
   +1 for having this discussion though... should we take it to the dev list to 
try to get more feedback on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to