A. Pagaltzis writes:

> Why does it have to be either/or?
> 
> There could be two keyword lists, one with fixed keywords, and the
> other freeform. Their names would have to be chosen carefully to
> suggest this as the intended use (rather than filling both with the
> same keywords) -- maybe ``keywords'' and ``additional_keywords'' or
> something.

Warning:  Wild conjecture and multiple unrelated crazy ideas ahead.

This is becoming _way_ too complicated.  The more complicated it
becomes, the less chance there is of Cpan uploaders understanding it,
doing it well, or even bothering with it -- even presuming that they
hear about it in the first place.

And I still reckon most humans are approximately appalling at picking
appropriate keywords anyway.  A system like you're proposing still
requires an individual module's author to think of the right keywords
and bother to do this, which is putting a single-point of failure in the
system.

However, improved Cpan searching would be welcome.  There have been
occasions when it's taken me a lot of searching to locate a module (or
sometimes I didn't discover it by searching, but only encountered it by
chance later).  At that particular moment in time I am in the position
of having the name of a module plus a search term that I had hoped would
lead to that module but didn't.  This isn't some theoretical
classification, or term that somebody might search for, but one that I
actually did use.

It has occurred to me that at such a point it'd be good if I could
somehow 'tag' the module in question with the search term, so that
people searching for that term in future would find that module.  In
other words, it's the search-engine users, not the module authors who
define the keywords.  So no individual has to be great at
classifications, just a collective group being OK at it on average.

It would also mean the keywords used are the vocabulary of the target
audience; it doesn't actually matter if some of the keywords are not
what the author would use (or even if they're 'wrong'), so long as that
audience are finding them useful.

Could something like this be done just with the existing Cpan Ratings
system?  If you find a module is good for a particular task then you
give it a high rating and mention the task it's good for in the review?
So if the text of reviews were searched, and the ratings contributing
towards the ranking of results, would that be enough?  Or would the
'noise' of other words in reviews make this useless?  The fact that
Google works so well parsing significance from plain text makes me think
it's got a chance.

Another possibility is just allowing any user to 'assign' any keyword to
any module, and have those keywords added to the list of things the
search engine looks through.  That's more likely to be open to abuse
from authors -- either deviously trying to get more attention for
'their' module, or inadvertently picking bad keywords -- but no more so
than with something in META.yml.

The system degrades as it tends towards every keyword being assigned to
every module, of course; allowing people to remove keywords is
problematic (if a module is only slightly relevant to a keyword is
having it assigned helpful for those rare situations, or making it worse
for the common situations where the module isn't relevant?).

This kind of conflict between different people's views also occurs on
wikis, at least some of which seem to have a reasonable way of dealing
with them.  So perhaps each module (or distribution) page just needs a
wiki area where any users can add any useful annotations, keywords, or
whatever (or correct those made by previous users), and have the
wiki-like-comments being searchable too?

Would that overlapping too much with the Cpan Ratings system -- where's
the line between adding a useful annotation providing useful (hopefully
factual) information ("This module requires Windows 2000 or newer" say
or "Installing this module will phone home to the author's web-server")
and posting a (subjective) hatchet-job review?

Or perhaps module-keyword pairings are the way forward but also need
some kind of score against them, whereby more people advocating a
keyword gives a higher score (and possibly allowing negative-allocations
too); that way outliers are not so much of a problem, and we only need
for users to be right on average, kind-of similar to how 20Q.net learns
things from the average/consensus of many people's views:

  http://www.20q.net/

Do we actually want something like a 20Q.net instance where the only
objects the system knows about are Perl modules?  If you know a module
then you can teach the system about it; if you're looking for a module
then you answer its questions to describe the characteristics you're
after and look at the list of suggested modules it throws up?

[Don't say I didn't warn you about the crazy ideas.]

Smylers

Reply via email to