after a coffee, I don't see the point of calling it still "k" so "max_n" is my vote for what its worth.
On Sun, May 30, 2021 at 8:38 AM Ilhan Polat <ilhanpo...@gmail.com> wrote: > Since this going into the top namespace, I'd also vote against the > matlab-y "topk" name. And even matlab didn't do what I would expect and > went with maxk > > https://nl.mathworks.com/help/matlab/ref/maxk.html > > I think "max_k" is a good generalization of the regular "max". Even when > auto-completing, this showing up under max makes sense to me instead of > searching them inside "t"s. Besides, "argmax_k" also follows suite, that of > the previous convention. To my eyes this is an acceptable disturbance to an > already very crowded namespace. > > > > a few moments later.... > > But then again an ugly idea rears its head proposing this going into the > existing max function. But I'll shut up now :) > > > > > > > > On Sun, May 30, 2021 at 12:50 AM Robert Kern <robert.k...@gmail.com> > wrote: > >> On Sat, May 29, 2021 at 3:35 PM Daniele Nicolodi <dani...@grinta.net> >> wrote: >> >>> What does k stand for here? As someone that never encountered this >>> function before I find both names equally confusing. If I understand >>> what the function is supposed to be doing, I think largest() would be >>> much more descriptive. >>> >> >> `k` is the number of elements to return. `largest()` can connote that >> it's only returning the one largest value. It's fairly typical to include a >> dummy variable (`k` or `n`) in the name to indicate that the function lets >> you specify how many you want. See, for example, the stdlib `heapq` >> module's `nlargest()` function. >> >> https://docs.python.org/3/library/heapq.html#heapq.nlargest >> >> "top-k" comes from the ML community where this function is used to >> evaluate classification models (`k` instead of `n` being largely an >> accident of history, I imagine). In many classification problems, the >> number of classes is very large, and they are very related to each other. >> For example, ImageNet has a lot of different dog breeds broken out as >> separate classes. In order to get a more balanced view of the relative >> performance of the classification models, you often want to check whether >> the correct class is in the top 5 classes (or whatever `k` is appropriate) >> that the model predicted for the example, not just the one class that the >> model says is the most likely. "5 largest" doesn't really work in the >> sentences that one usually writes when talking about ML classifiers; they >> are talking about the 5 classes that are associated with the 5 largest >> values from the predictor, not the values themselves. So "top k" is what >> gets used in ML discussions, and that transfers over to the name of the >> function in ML libraries. >> >> It is a top-down reflection of the higher level thing that people want to >> compute (in that context) rather than a bottom-up description of how the >> function is manipulating the input, if that makes sense. Either one is a >> valid way to name things. There is a lot to be said for numpy's >> domain-agnostic nature that we should prefer the bottom-up description >> style of naming. However, we are also in the midst of a diversifying >> ecosystem of array libraries, largely driven by the ML domain, and adopting >> some of that terminology when we try to enhance our interoperability with >> those libraries is also a factor to be considered. >> >> -- >> Robert Kern >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion