Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

Matthias J. Sax Tue, 16 Jul 2019 14:44:40 -0700

It's a fair point that Ryanne raises. However, "running sum" is the same
as "moving sum" from my understanding.


The issue is still, that `Sum` and `Count` which seem to be the cleanest
names cannot be used. While I agree that `TotalSum` and `TotalCount` is
somewhat redundant, I still think it the best suggestion so far.

For the "sampled" version, I am personally fine with either `MovingXxx`,
`WindowedXxx`, or `RunningXxx` -- to me, that are all equally good to
describe the semantics.



-Matthias



On 7/16/19 2:25 PM, Sophie Blee-Goldman wrote:
> I'm +1 on Windowed, was about to suggest that as I was catching up on the
> discussion but Bill beat me to it :)
> 
> On Tue, Jul 16, 2019 at 2:23 PM Bill Bejeck <[email protected]> wrote:
> 
>> Hi John,
>>
>> Thanks for the updates.
>>
>> I like RunningCount and RunningSum.
>>
>> What about WindowedCount, WindowedSum instead of Moving?
>> I'm just throwing this out there as Windowed seems more intuitive to me,
>> but I'm not married to the idea.
>>
>> -Bill
>>
>> On Tue, Jul 16, 2019 at 5:09 PM John Roesler <[email protected]> wrote:
>>
>>> No worries! Choosing good public API names is a high-impact design
>>> activity.
>>>
>>> Matthias, Bruno, Bill, and Stanislav,
>>>
>>> You've all contributed to this discussion or the vote so far... How do
>>> you feel about the proposed name change:
>>>
>>> MovingCount, MovingSum   (instead of Sampled)
>>> RunningCount, RunningSum   (Instead of Total)
>>>
>>> Thanks,
>>> -John
>>>
>>> On Tue, Jul 16, 2019 at 3:04 PM Ryanne Dolan <[email protected]>
>>> wrote:
>>>>
>>>> John, that makes sense to me. Sorry for the bikeshedding.
>>>>
>>>> Ryanne
>>>>
>>>> On Tue, Jul 16, 2019 at 12:49 PM John Roesler <[email protected]>
>> wrote:
>>>>
>>>>> Thanks for the explanation and the suggestion, Ryanne,
>>>>>
>>>>> I went with "sampled" just because these are instances of
>> SampledStat,
>>>>> which in the Kafka Metrics ecosystem are computed from a window of
>>>>> recent samples. Thinking more about it, the fact that they are
>> sampled
>>>>> and the fact that they are windowed are orthogonal, which is what
>>>>> you're pointing out... sampling by itself doesn't indicate that it's
>> a
>>>>> moving average.
>>>>>
>>>>> Since there is no way in Kafka Metrics for a metric to be sampled and
>>>>> not windowed/moving/decaying, calling them Sampled would never be
>>>>> incorrect. But to someone unfamiliar with the code, it wouldn't
>>>>> immediately suggest the behavior of the metric that actually matters.
>>>>> That is, the behavior that distinguishes the two classes of metrics
>> we
>>>>> want to disambiguate here.
>>>>>
>>>>> It sounds like you'd suggest a new matrix of names:
>>>>> MovingCount, MovingSum
>>>>> RunningCount, RunningSum
>>>>>
>>>>> Are these names unambiguous and self explanatory?
>>>>>
>>>>> Thanks,
>>>>> -John
>>>>>
>>>>> On Tue, Jul 16, 2019 at 12:32 PM Ryanne Dolan <[email protected]
>>>
>>>>> wrote:
>>>>>>
>>>>>>> measurements, which decay/expire over time
>>>>>>
>>>>>> Thanks John for the clarification. This was my (re-)reading of the
>>> code,
>>>>>> but this is not what I think of when I hear "sampled". In fact,
>>> you'll
>>>>>> notice that the Wikipedia pages for "Sample (statistics)" and
>> "Sample
>>>>>> (signal processing)" do not contain the words decay, expire,
>> recent,
>>>>>> history, or anything similar.
>>>>>>
>>>>>> Similar to "running", I'd suggest the more correct "moving", as in
>>>>> "moving
>>>>>> average" and "moving sum", which involve looking back N samples,
>>>>> applying a
>>>>>> sliding window, decaying over time etc.
>>>>>>
>>>>>> Ryanne
>>>>>>
>>>>>> On Tue, Jul 16, 2019, 11:58 AM John Roesler <[email protected]>
>>> wrote:
>>>>>>
>>>>>>> Thanks for raising this concern, Ryanne,
>>>>>>>
>>>>>>> "Sampled" indicates that the metrics is sampled, namely that we
>>>>>>> maintain a set of samples from recent value measurements, which
>>>>>>> decay/expire over time. So, the metric value is only
>>> representative of
>>>>>>> the recent past.
>>>>>>>
>>>>>>> "Total" indicates that the metric value contains all the
>>> information
>>>>>>> from the creation of the metric. For example., the total sum
>> would
>>>>>>> include all measurements since the app started up.
>>>>>>>
>>>>>>> It seems like your concern is that the word "total" doesn't
>> really
>>>>>>> pinpoint this meaning, which is true. It's especially confusing
>>> that
>>>>>>> another meaning of "total" is synonymous with "sum", rendering
>> the
>>>>>>> name "TotalSum" sort of absurd.
>>>>>>>
>>>>>>> We previously considered "cumulative", which was rejected as a
>>>>>>> mouthful (it's four syllables) .
>>>>>>>
>>>>>>> You mentioned "running", which might be a more appropriate
>> modifier
>>>>>>> (RunningSum and RunningCount).
>>>>>>>
>>>>>>> What would everyone think about that?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> -John
>>>>>>>
>>>>>>> On Tue, Jul 16, 2019 at 11:27 AM Ryanne Dolan <
>>> [email protected]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> John, I mentioned on the VOTE thread that the proposed names
>> are
>>> a
>>>>> bit
>>>>>>>> confusing,
>>>>>>>>
>>>>>>>>> given that "sum", "total", and "count" are roughly
>>> synonymous...
>>>>>>>>
>>>>>>>> In particular, TotalSum is, I think, a "running total", though
>>> the
>>>>> naming
>>>>>>>> doesn't really capture that.
>>>>>>>>
>>>>>>>> I think to avoid confusion, we should define exactly what
>>> "total" and
>>>>>>>> "sampled" are supposed to indicate, and perhaps pick
>> appropriate
>>>>> naming
>>>>>>>> from there.
>>>>>>>>
>>>>>>>> Ryanne
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 12, 2019 at 1:42 PM John Roesler <
>> [email protected]>
>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hey, thanks Matthias and Bruno,
>>>>>>>>>
>>>>>>>>> I agree, "Cumulative" is a mouthful. "TotalX" sounds fine to
>>> me.
>>>>>>>>>
>>>>>>>>> Also, yes, I would have liked to not have any modifier for
>>>>>>>>> "non-sampled", but there is a name conflict with Sum.
>>>>>>>>>
>>>>>>>>> I'll update the KIP to reflect "TotalX" and then start the
>> vote
>>>>> thread.
>>>>>>>>>
>>>>>>>>> Thanks again,
>>>>>>>>> -John
>>>>>>>>>
>>>>>>>>> On Fri, Jul 12, 2019 at 11:27 AM Bruno Cadonna <
>>> [email protected]
>>>>>>
>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> OK, makes sense. Then, I am in favour of TotalCount and
>>> TotalSum.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Bruno
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 12, 2019 at 12:57 AM Matthias J. Sax <
>>>>>>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> `Sum` is an existing name, for the "sampled sum" metric,
>>> that
>>>>> gets
>>>>>>>>>>> deprecated. Hence, we cannot use it.
>>>>>>>>>>>
>>>>>>>>>>> If we cannot use `Sum` and use `TotalSum`, we should also
>>> not
>>>>> use
>>>>>>>>>>> `Count` but `TotalCount` for consistency.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Matthias
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 7/11/19 12:58 PM, Bruno Cadonna wrote:
>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you for the KIP.
>>>>>>>>>>>>
>>>>>>>>>>>> LGTM
>>>>>>>>>>>>
>>>>>>>>>>>> I also do not like CumulativeSum/Count so much. I
>>> propose to
>>>>> just
>>>>>>>>> call
>>>>>>>>>>>> it Sum and Count.
>>>>>>>>>>>>
>>>>>>>>>>>> I understand that you want to unequivocally distinguish
>>> the
>>>>> two
>>>>>>>>> metric
>>>>>>>>>>>> functions by their names, but I have the feeling the
>>> names
>>>>> become
>>>>>>>>>>>> artificially complex. The exact semantics can also be
>>>>> documented
>>>>>>> in
>>>>>>>>>>>> the javadocs, which btw could also be improved in those
>>>>> classes.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Bruno
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jul 11, 2019 at 8:25 PM Matthias J. Sax <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the KIP. Overall LGTM.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The only though I have is, if we may want to use
>>> `TotalSum`
>>>>> and
>>>>>>>>>>>>> `TotalCount` instead of `CumulativeSum/Count` as
>> names?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 7/11/19 9:31 AM, John Roesler wrote:
>>>>>>>>>>>>>> Hi Kafka devs,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'd like to propose KIP-488 as a minor cleanup of
>> some
>>> of
>>>>> our
>>>>>>>>> metric
>>>>>>>>>>>>>> implementations.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> KIP-488:
>> https://cwiki.apache.org/confluence/x/kkAyBw
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Over time, iterative updates to these metrics has
>>> resulted
>>>>> in a
>>>>>>>>> pretty
>>>>>>>>>>>>>> confusing little collection of classes, and I've
>>> personally
>>>>>>> been
>>>>>>>>>>>>>> involved in three separate moderately time-consuming
>>>>>>> iterations of
>>>>>>>>> me
>>>>>>>>>>>>>> or someone else trying to work out which metrics are
>>>>>>> available, and
>>>>>>>>>>>>>> which ones are desired for a given use case. One of
>>> these
>>>>> was
>>>>>>>>> actually
>>>>>>>>>>>>>> a long-running bug in Kafka Streams' metrics, so not
>>> only
>>>>> has
>>>>>>> this
>>>>>>>>>>>>>> confusion been a time sink, but it has also led to
>>> bugs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm hoping this change won't be too controversial.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>
>>
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

Reply via email to