A. Pagaltzis writes: > Why does it have to be either/or? > > There could be two keyword lists, one with fixed keywords, and the > other freeform. Their names would have to be chosen carefully to > suggest this as the intended use (rather than filling both with the > same keywords) -- maybe ``keywords'' and ``additional_keywords'' or > something.
Warning: Wild conjecture and multiple unrelated crazy ideas ahead. This is becoming _way_ too complicated. The more complicated it becomes, the less chance there is of Cpan uploaders understanding it, doing it well, or even bothering with it -- even presuming that they hear about it in the first place. And I still reckon most humans are approximately appalling at picking appropriate keywords anyway. A system like you're proposing still requires an individual module's author to think of the right keywords and bother to do this, which is putting a single-point of failure in the system. However, improved Cpan searching would be welcome. There have been occasions when it's taken me a lot of searching to locate a module (or sometimes I didn't discover it by searching, but only encountered it by chance later). At that particular moment in time I am in the position of having the name of a module plus a search term that I had hoped would lead to that module but didn't. This isn't some theoretical classification, or term that somebody might search for, but one that I actually did use. It has occurred to me that at such a point it'd be good if I could somehow 'tag' the module in question with the search term, so that people searching for that term in future would find that module. In other words, it's the search-engine users, not the module authors who define the keywords. So no individual has to be great at classifications, just a collective group being OK at it on average. It would also mean the keywords used are the vocabulary of the target audience; it doesn't actually matter if some of the keywords are not what the author would use (or even if they're 'wrong'), so long as that audience are finding them useful. Could something like this be done just with the existing Cpan Ratings system? If you find a module is good for a particular task then you give it a high rating and mention the task it's good for in the review? So if the text of reviews were searched, and the ratings contributing towards the ranking of results, would that be enough? Or would the 'noise' of other words in reviews make this useless? The fact that Google works so well parsing significance from plain text makes me think it's got a chance. Another possibility is just allowing any user to 'assign' any keyword to any module, and have those keywords added to the list of things the search engine looks through. That's more likely to be open to abuse from authors -- either deviously trying to get more attention for 'their' module, or inadvertently picking bad keywords -- but no more so than with something in META.yml. The system degrades as it tends towards every keyword being assigned to every module, of course; allowing people to remove keywords is problematic (if a module is only slightly relevant to a keyword is having it assigned helpful for those rare situations, or making it worse for the common situations where the module isn't relevant?). This kind of conflict between different people's views also occurs on wikis, at least some of which seem to have a reasonable way of dealing with them. So perhaps each module (or distribution) page just needs a wiki area where any users can add any useful annotations, keywords, or whatever (or correct those made by previous users), and have the wiki-like-comments being searchable too? Would that overlapping too much with the Cpan Ratings system -- where's the line between adding a useful annotation providing useful (hopefully factual) information ("This module requires Windows 2000 or newer" say or "Installing this module will phone home to the author's web-server") and posting a (subjective) hatchet-job review? Or perhaps module-keyword pairings are the way forward but also need some kind of score against them, whereby more people advocating a keyword gives a higher score (and possibly allowing negative-allocations too); that way outliers are not so much of a problem, and we only need for users to be right on average, kind-of similar to how 20Q.net learns things from the average/consensus of many people's views: http://www.20q.net/ Do we actually want something like a 20Q.net instance where the only objects the system knows about are Perl modules? If you know a module then you can teach the system about it; if you're looking for a module then you answer its questions to describe the characteristics you're after and look at the list of suggested modules it throws up? [Don't say I didn't warn you about the crazy ideas.] Smylers