Re: Let's eliminate the Module List

Tim Bunce Thu, 19 Aug 2004 06:02:33 -0700

On Wed, Aug 18, 2004 at 04:54:32PM -0500, Andy Lester wrote:
> I propose eliminating the Long Module List.  I'm talking about
> http://www.cpan.org/modules/00modlist.long.html (2998 modules), not
> http://www.cpan.org/modules/01modules.index.html (6800 modules).

I, as the original author of the document, have said the same several
times recently.

There are a lot of issues and much history in this area.

I've appended a couple of my emails where some of this has been
discussed recently. (It doesn't help that so many mailing lists are
relevant so duplicate fragmented discussions are common.)

While many people moan, few people have the big picture, and fewer still
have volunteered to put in real effort to change things.

Andreas, please delete modules/00modlist.long.html ASAP and remove the
four links in modules/index.html

> =item * Inclusion on the list is effectively arbitrary.
> 
> It doesn't mean anything to have a module on that list.  It's certainly
> not a stamp of quality.  I don't mean to ignite the debate over whether
> there should be some "Perl Approved CPAN module" apparatus should exist;
> only that inclusion on the Module List is not it.

Take care not to confuse the issues here. It doesn't mean anything
to have a module listed in 00modlist.long.html only because it's
out of date. Getting it up to date would fix that but is pointless now.
The file should be deleted.

But you've not addressed 03modlist.data.gz See below.

> =item * The resources used could be better used elsewhere.
> 
> There's significant amount of human time and machine resources that go
> into maintaining the Long Module List.  For that matter, it's a waste
> of developer time proposing inclusion on a list that nobody looks at.

You're confusing the issues here. Zero resources are used to maintain
00modlist.long.html, and 03modlist.data.gz is maintained via the PAUSE db.

The only human time relates to how the [EMAIL PROTECTED] list operates.

Module registration typically only requires one of the [EMAIL PROTECTED]
moderators to click on a link in an automatically generated email.

The fundamental problem is that [EMAIL PROTECTED] works on a "silence is
acceptance" model to allow negative comments to be raised by a moderator
who may only read the list every few days. But after a few days have past
it's hard for moderators to remember to go back over old emails and
approve them. That process was meant to be automated (via RT) but it
hasn't happened yet.

> The one bit of value that I see in this process is where Graham looks
> at submissions that people have sent in and, if something seems like
> it's duplicate effort, tries to redirect the author to reduce the
> duplication. (http://www.nntp.perl.org/group/perl.modules/34207)

Actually there's a team of people on [EMAIL PROTECTED] who do the
'moderating'. With no disrespect to Graham I'm sure he'd agree that
brian d foy has been doing most of that work in recent months.

> Unfortunately, that requires the author to submit a proposal for
> inclusion, and since fewer than half of the authors submit the modules,
> it's hardly a complete filter.
> 
> I welcome your thoughts.  How can we capture the good part of the module
> list (the human filtering), and remove the obsoleted infrastructure?

I'd be interested in your thoughts about the points discussed below.

[I've added a few comments in square brackets to some points.]

Tim.

Date: Mon, 16 Feb 2004 10:43:13 +0000
From: Tim Bunce <[EMAIL PROTECTED]>
To: Sam Vilain <[EMAIL PROTECTED]>
Cc: Tim Bunce <[EMAIL PROTECTED]>, [EMAIL PROTECTED],
   Andreas J Koenig <[EMAIL PROTECTED]>
Subject: Re: Module lists: defining the problem, restating the goals [was Re: OK, so 
we've decided...]
Message-ID: <[EMAIL PROTECTED]>
References: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
In-Reply-To: <[EMAIL PROTECTED]>

On Mon, Feb 16, 2004 at 10:37:12AM +1300, Sam Vilain wrote:
> On Mon, 16 Feb 2004 01:32, Tim Bunce wrote;
> 
>   > I'd like to see a summary of what those "needs of the community"
>   > are.  (Maybe I missed it as I've not been following as closely as
>   > I'd have liked. In which case a link to an archived summary would
>   > be great.)
>   > It's very important to be clear about what the problems actually
>   > are.
> 
> I don't really want to argue this side of things, I think that the
> problems pretty much speak for themselves.  But I hate unspoken
> consensus, so let me suggest a few from my perspective; this applies
> to the combined Perl 5 modules list / using search.cpan.org:

I'll play devils advocate here and point out some alternative remedies
for the problems. By doing so I'm _not_ trying to detract for your
suggestion, which I like, I'm just trying to show how existing mechanisms
could be improved incrementally.

>   a) searching for modules for a particular task takes a long time and
>      unless you get your key words right, you might not find it at
>      all.  Refer the recent Mail::SendEasy thread.

Calls for a richer set of categories and cross-links of some kind.
(Editorial content alone is basically just more words to a search engine.)

>   b) it is very difficult to find good reviews weighing the pros and
>      cons of similar modules; they exist, but are scattered.
> 
>      CPAN ratings was a nice idea, but has too many "First Post!"
>      style reviews to be useful in its current form IMHO.

Argues for moderation of reviews and a minimum review length.
A "was this review helpful" mechanism would also help to bring
better reviews to the top.  Also the search.cpan.org should not
just show the overall rating, it should show the underlying three
individual ratings (docs, interface, ease of use).

[Elaine had cogent words to say on this hot topic in this thread:
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&safe=off&selm=20040723072713.GZ26364%40chaos.wustl.edu
 ]

>   c) it is nearly impossible to tell which modules are the wisest
>      choices from a community size point of view; using modules that
>      are more likely to fall out of maintenance is easy to do.

Argues for more stats. I think useful *relative* download stats
could be extracted from a sample of CPAN sites. Also search.cpan.org
could provide relative page-*view* stats for modules.

[Relative is the key word here. But I tend to agree with Elaine and
others who say that such stats are meaningless. They'll never be
sufficiently useful and even if they were accurate they'd tend to
promote unstable modules over stable modules that people don't have
to upgrade often.]

>   d) some great modules are not registered (I am referring of course
>      to such masterpieces as Pixie, Heritable::Types, Maptastic :),
>      Spiffy, Autodia, Want ... and those are just the ones missing
>      in my bag of tricks)

Argues for fixing the registration process.

[See below]

>   > Originally the Module List had two goals:
>   >  1: to help people find perl modules for a particular task.
>   >  2: to provide a second-tier of modules above the 'anarchy' of 
>   >     people uploading half-baked ideas with half-baked names.
> 
> Honourable goals, which it solved adequately for a period of time, and
> full credit where it is due.
> 
> But now let's look at where we are.  We've got masses of modules,
> truckloads of categories and thousands of contributors.  This task
> cannot be left in the hands of a handful of hackers, no matter how
> much awe they inspire, they probably still have lives and day jobs ;)

The registration process can, and should, be automatic for any modules
for which no one objects. You'd apply and RT would automatically
register if no one commented on the application.

> I will maintain that the current format, or even simply adding some
> more fields to the database is *not* enough information to give
> uninformed people looking for a module the information to make an
> informed decision.

[I think the most practical way to address the "help me find good
modules for doing X" is to pursue the multiple SDK model.
Have people and groups publish and publicize their own favorite
groups of modules (as Bundles) for broad application areas.]

> It is my gut feeling that only editorial content, managed by people
> who are experts in the field, will truly perform this task - and that
> to gain maximum support, that it should be included in the content
> mirrored along with the rest of cpan.org.

I agree that comparative editorial reviews would be very valuable
for Goal 1 above. I wouldn't address Goal 2 effectively at all.

[Though I agree with Elaine and others who point out that comparative
editorial reviews take a lot of effort and are likely to be rare.]

> I think we're mature enough as a community to be able to produce this
> content without it dissolving into flamewars or being too one-sided.
> 
> In particular, I really think that as little red tape should be
> applied to this system as possible.  Let's just set up a few CVS /
> subversion accounts, to edit content that is autopublishing to the
> www.cpan.org site, with a few disclaimers chucked on the bottom.
> LARTing the naive and troublesome as appropriate.  

I agree. This all seems very similar to the DMOZ.org project that
maintains reviews of millions of web sites:
        "6,095,104 sites - 61,277 editors - 551,043 categories"
That's a mature and very successful model (used by google directory etc)
that's well worth learning from.

>   > The text file is out of date. The underlying database isn't:
>      [...]
>   > Please work with the PAUSE system, and Andreas and myself, to
>   > enhance what already exists (which includes a UI for module
>   > authors to pick which category they want the module in).
> 
> I'd be honoured to.  I think that the plan you propose would be an
> excellent step forward, and whether or not the editorial plan gains
> acceptance then it has merit.
> 
> Unfortunately right now I have to move house :-) but I should be able
> to dedicate at some time this week to research and kick-start this
> recategorisation effort.

Ultimately countless good ideas fail for lack of time.

Tim.

Date: Mon, 16 Feb 2004 15:44:42 +0000
From: Tim Bunce <[EMAIL PROTECTED]>
To: Zefram <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Subject: Future of the "Module List"
Message-ID: <[EMAIL PROTECTED]>
References: <[EMAIL PROTECTED]>
In-Reply-To: <[EMAIL PROTECTED]>

On Sun, Feb 15, 2004 at 08:20:39PM +0000, Zefram wrote:
> I am rather confused.
> 
> Many documents on PAUSE and CPAN refer to the Perl 5 module list.
> It is, for example, to be consulted when considering module naming, and
> to search for pre-existing modules.  Documents all over the web refer
> to it.  Without exception that I have been able to find, all of these
> references point to <http://www.cpan.org/modules/00modlist.long.html>
> as the place to get hold of the module list.
> 
> The document served at that URI is dated 2002-08-27 internally,
> and has a timestamp of 2002-08-28.  It lists (at a rough estimate)
> somewhere in the region of only half the number of modules that appear
> in the by-module hierarchy.  Evidently this is the module list as it
> was eighteen months ago.
> 
> PAUSE appears to maintain all the metadata that goes into the module
> list.  search.cpan.org searches on a recent version of this metadata.
> Evidently the module list is still being maintained.

This message I posted to module-authors recently may help explain:

  http://www.mail-archive.com/[EMAIL PROTECTED]/msg01752.html

(I could have cross-posted it here. I certainly meant to at least
tell people here that an important discussion was happening on
module-authors. So thanks for bringing this up now. Well timed.)

I'd recommend all [EMAIL PROTECTED] subscribers read that message.

> This is why I am mailing you to ask: what's going on?  Why is such
> an outdated module list being published in an authoritative location,
> and where can I get an up-to-date list?

Module List *document* was maintained by hand.  When management of
the Module List *data* was automated there was a desire to automate
maintainance of the document but the document had a slightly richer
structure than the data. That small hurdle meant automation never
happened and the document was left unmaintained.

Around the same time search.cpan.org became functional so the
document had less relevance and busy people had other things to do.

I'll happily concede that the *document* isn't important these days.
But I feel strongly that the *principle* (of moderated naming and
categorization) is.

The main pieces currently missing are:

1. Automated handling of module registration. [Where has that got to?]

2. Better integration of registration data into search.cpan.org
   So registration details are includes in search results, for example.

3. A 'fast path' process to register many modules that are in
   widespread use but are not registered. So then the majority of
   modules are registered and 

The alternative is just to give up. Seriously. We could just stop
banging our heads against authors uploading half baked ideas with
half-baked names (which are "self explanatory" to them).

The hope would be that out of the anarchy would rise some new
form of organization[*] (in the broadest sense) that would help
hapless users find what their looking for.

Do we want to do that? [I'm asking this question in all seriousness.]

Tim.

p.s. I think the term "Module List" should be deprecated in favor
of talking about "Registered Modules" and "module registration" etc.

[*] Presumably based on metadata, rantings, editorial reviews, download
stats and/or whatever else people can come up with.

[Two final points:

Much good planning was achieved as the CPLAN meeting in London in
October last year, some of which is relevant here:
  http://cplan.kwiki.org/index.cgi?MinutesAccordingToAutrijus

Perl 6 will raise it's own set of issues, especially with regard
to fully-qualified module naming.
]

Re: Let's eliminate the Module List

Reply via email to