Re: [DISCUSS] Stream Pipelines on hot paths

Benedict Elliott Smith Fri, 31 May 2024 07:20:46 -0700

I think I have already proposed a simple solution to this problem on the 
thread: if anyone says it’s a hot path (and cannot be persuaded otherwise), it 
should be treated as such. Saves argument, but permits an easy escape hatch if 
everyone agrees with minimal discussion.


I think this is a good general principle for raising standards in the codebase 
like this: if somebody says something is important, and cannot be convinced 
otherwise, then it should generally be treated as important. This is different 
from cases where there are simply competing approaches.

That said, if people want to be absolutist about this I won’t mind.



> On 31 May 2024, at 15:04, Benjamin Lerer <[email protected]> wrote:
> 
> For me the definition of hot path is too vague. We had arguments with 
> Berenger multiple times and it is more a waste of time than anything else at 
> the end. If we are truly concerned about stream efficiency then we should 
> simply forbid them. That will avoid lengthy discussions about what constitute 
> the hot path and what does not.
> 
> Le ven. 31 mai 2024 à 11:08, Berenguer Blasi <[email protected] 
> <mailto:[email protected]>> a écrit :
>> +1 on avoiding streams in hot paths
>> 
>> On 31/5/24 9:48, Benedict wrote:
>>> My concept of hot path is simply anything we can expect to be called 
>>> frequently enough in normal operation that it might show up in a profiler. 
>>> If it’s a library method then it’s reasonable to assume it should be able 
>>> to be used in a hot path unless clearly labelled otherwise. 
>>> 
>>> In my view this includes things that might normally be masked by caching 
>>> but under supported workloads may not be - such as query preparation.
>>> 
>>> In fact, I’d say the default assumption should probably be that a method is 
>>> “in a hot path” unless there’s good argument they aren’t - such as that the 
>>> operation is likely to be run at some low frequency and the slow part is 
>>> not part of any loop. Repair setup messages perhaps aren’t a hot path for 
>>> instance (unless many of them are sent per repair), but validation 
>>> compaction or merkle tree construction definitely is.
>>> 
>>> I think it’s fine to not have perfect agreement about edge cases, but if 
>>> anyone in a discussion thinks something is a hot path then it should be 
>>> treated as one IMO.
>>> 
>>>> On 30 May 2024, at 18:39, David Capwell <[email protected]> 
>>>> <mailto:[email protected]> wrote:
>>>> 
>>>>  As a general statement I agree with you (same for String.format as 
>>>> well), but one thing to call out is that it can be hard to tell what is 
>>>> the hot path and what isn’t.  When you are doing background work (like 
>>>> repair) its clear, but when touching something internal it can be hard to 
>>>> tell; this can also be hard with shared code as it gets authored outside 
>>>> the hot path then later used in the hot path…
>>>> 
>>>> Also, what defines hot path?  Is this user facing only?  What about 
>>>> Validation/Streaming (stuff processing a large dataset)?  
>>>> 
>>>>> On May 30, 2024, at 9:29 AM, Benedict <[email protected]> 
>>>>> <mailto:[email protected]> wrote:
>>>>> 
>>>>> Since it’s related to the logging discussion we’re already having, I have 
>>>>> seen stream pipelines showing up in a lot of traces recently. I am 
>>>>> surprised; I thought it was understood that they shouldn’t be used on hot 
>>>>> paths as they are not typically as efficient as old skool for-each 
>>>>> constructions done sensibly, especially for small collections that may 
>>>>> normally take zero or one items.
>>>>> 
>>>>> I would like to propose forbidding the use of streams on hot paths 
>>>>> without good justification that the cost:benefit is justified. 
>>>>> 
>>>>> It looks like it was nominally agreed two years ago that we would include 
>>>>> words to this effect in the code style guide, but I forgot to include 
>>>>> them when I transferred the new contents from the Google Doc proposal. So 
>>>>> we could just include the “Performance” section that was meant to be 
>>>>> included at the time.
>>>>> 
>>>>> lists.apache.org
>>>>> <favicon.ico>
>>>>>  
>>>>> <https://lists.apache.org/thread/1mt8rsg36p1mq8s8578l6js075lrmvlt>lists.apache.org
>>>>>  <https://lists.apache.org/thread/1mt8rsg36p1mq8s8578l6js075lrmvlt>    
>>>>> <favicon.ico> 
>>>>> <https://lists.apache.org/thread/1mt8rsg36p1mq8s8578l6js075lrmvlt>
>>>>> 
>>>>> 
>>>>>> On 30 May 2024, at 13:33, Štefan Miklošovič 
>>>>>> <[email protected]> <mailto:[email protected]> wrote:
>>>>>> 
>>>>>> 
>>>>>> I see the feedback is overall positive. I will merge that and I will 
>>>>>> improve the documentation on the website along with what Benedict 
>>>>>> suggested.
>>>>>> 
>>>>>> On Thu, May 30, 2024 at 10:32 AM Mick Semb Wever <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>   
>>>>>>>  
>>>>>>>> Based on these findings, I went through the code and I have 
>>>>>>>> incorporated these rules and I rewrote it like this:
>>>>>>>> 
>>>>>>>> 1) no wrapping in "if" if we are not logging more than 2 parameters.
>>>>>>>> 2) rewritten log messages to not contain any string concatenation but 
>>>>>>>> moving it all to placeholders ({}).
>>>>>>>> 3) wrap it in "if" if we need to execute a method(s) on parameter(s) 
>>>>>>>> which is resource-consuming.
>>>>>>> 
>>>>>>> 
>>>>>>> +1
>>>>>>> 
>>>>>>> 
>>>>>>> It's a shame slf4j botched it with lambdas, their 2.0 fluent api 
>>>>>>> doesn't impress me.
>>>>

Re: [DISCUSS] Stream Pipelines on hot paths

Reply via email to