gianm opened a new pull request, #12520:
URL: https://github.com/apache/druid/pull/12520

   In the majority of cases, this improves performance.
   
   There's only one case I'm aware of where this may be a net negative: for 
`time_floor(__time, <period>)` where there are many repeated `__time` values. 
In nonvectorized processing, 
SingleLongInputCachingExpressionColumnValueSelector implements an optimization 
to avoid computing the `time_floor` function on every row. There is no such 
optimization in vectorized processing.
   
   IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a 
thing: it's not guaranteed that nonvectorized processing will be faster due to 
the optimization, because it would have to overcome the inherent speed 
advantage of vectorization. So it'd always require testing to determine the 
best setting for a specific dataset. It would be bad if users disabled 
vectorization thinking it would speed up their queries, and it actually slowed 
them down. And even if users do their own testing, at some point in the future 
we'll implement the optimization for vectorized processing too, and it's likely 
that users that explicitly disabled vectorization will continue to have it 
disabled. I'd like to avoid this outcome by encouraging all users to enable 
vectorization at all times. Really advanced users would be following 
development activity anyway, and can read this issue 🙂


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to