Since this going into the top namespace, I'd also vote against the matlab-y "topk" name. And even matlab didn't do what I would expect and went with maxk
https://nl.mathworks.com/help/matlab/ref/maxk.html I think "max_k" is a good generalization of the regular "max". Even when auto-completing, this showing up under max makes sense to me instead of searching them inside "t"s. Besides, "argmax_k" also follows suite, that of the previous convention. To my eyes this is an acceptable disturbance to an already very crowded namespace. a few moments later.... But then again an ugly idea rears its head proposing this going into the existing max function. But I'll shut up now :) On Sun, May 30, 2021 at 12:50 AM Robert Kern <robert.k...@gmail.com> wrote: > On Sat, May 29, 2021 at 3:35 PM Daniele Nicolodi <dani...@grinta.net> > wrote: > >> What does k stand for here? As someone that never encountered this >> function before I find both names equally confusing. If I understand >> what the function is supposed to be doing, I think largest() would be >> much more descriptive. >> > > `k` is the number of elements to return. `largest()` can connote that it's > only returning the one largest value. It's fairly typical to include a > dummy variable (`k` or `n`) in the name to indicate that the function lets > you specify how many you want. See, for example, the stdlib `heapq` > module's `nlargest()` function. > > https://docs.python.org/3/library/heapq.html#heapq.nlargest > > "top-k" comes from the ML community where this function is used to > evaluate classification models (`k` instead of `n` being largely an > accident of history, I imagine). In many classification problems, the > number of classes is very large, and they are very related to each other. > For example, ImageNet has a lot of different dog breeds broken out as > separate classes. In order to get a more balanced view of the relative > performance of the classification models, you often want to check whether > the correct class is in the top 5 classes (or whatever `k` is appropriate) > that the model predicted for the example, not just the one class that the > model says is the most likely. "5 largest" doesn't really work in the > sentences that one usually writes when talking about ML classifiers; they > are talking about the 5 classes that are associated with the 5 largest > values from the predictor, not the values themselves. So "top k" is what > gets used in ML discussions, and that transfers over to the name of the > function in ML libraries. > > It is a top-down reflection of the higher level thing that people want to > compute (in that context) rather than a bottom-up description of how the > function is manipulating the input, if that makes sense. Either one is a > valid way to name things. There is a lot to be said for numpy's > domain-agnostic nature that we should prefer the bottom-up description > style of naming. However, we are also in the midst of a diversifying > ecosystem of array libraries, largely driven by the ML domain, and adopting > some of that terminology when we try to enhance our interoperability with > those libraries is also a factor to be considered. > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion