Re: CPAN-river: can graph calculation be modified?

David Golden Fri, 02 Feb 2018 09:46:12 -0800

It's possible that an *alternate* simplest thing might be more meaningful:
count the number of distinct *authors* depended on by any distribution
(including, for the sake of example, the same author, but only once).


In the Foo case:

   - Foo has 3 authors depending on it
   - Foo-Bar has 3 authors depending on it
   - Foo-Bar-Noggin and Foo-Bar-Baz have 0 authors depending on it
   - Foo-Bar-A has 1 author depending on it

In the Neil's Thing case:

   - Thing has 2
   - Plant has 1
   - Fruit and Banana each have 1
   - Silver-Banana has 0

In Tux's Thing case, all the counts just increase by one and Distasteful
has 0.

Consider this case:
Zot (Larry) -> Pow (Moe) -> Splat (Curly) -> Whiff (Moe) -> Oof (Larry)


   - Zot has 3
   - Pow has 3
   - Splat has 2
   - Whif has 1
   - Oof has 0

The interesting thing about this metric to me is that it focuses on this
question: "If a module breaks, how many *people* are affected" which sounds
a lot more like what Jim's asking.

Counting an author as 1 for any downstream by the same author is arbitrary
-- I think it simplifies the analysis and gives more or less the same
answer, but it could be done the other way, too, if people preferred.

David


On Fri, Feb 2, 2018 at 9:48 AM, James E Keenan <jkee...@pobox.com> wrote:

> Overall Question:  How can we implement different ways of constructing the
> CPAN river?
>
> Background:
>
> Since about this time last year I've had occasion to use the concept of
> CPAN-river to derive lists of distributions to be tested against whatever
> Perl 5 blead is of the moment.  In particular, for the last three months
> I've been creating assessments of the impact of monthly Perl 5 development
> releases on the "top 1000" of the CPAN river.  (See, e.g.,
> http://thenceforward.net/perl/misc/cpan-river-1000-perl-5.27-master.psv.gz
> )
>
> To calculate the CPAN river, I've been using the programs developed by
> David Golden found here:
>
> https://github.com/dagolden/zzz-index-cpan-meta
>
> ... with one modification:  a local branch for the second of the three
> programs cited there.  I use a local branch because I'm using Linux and
> cannot install Ramdisk.
>
> Problem:
>
> As I've stared at this data over the past year I've become aware that the
> order in which distros appear in the river is not necessarily the most
> useful for assessing the real-world impact of changes in blead. Put less
> charitably, the CPAN river can be "gamed."  It is possible for a person to
> release a large number of distributions which have dependencies on other
> distributions by the same author.  That can boost some of those
> distributions high up into the CPAN river -- into, say, the "top 1000" that
> I use in my monthly program.
>
> But if that author's distributions are not depended upon by *other*
> authors' distributions then they are arguably less important than those
> such as Module-Build and DateTime which are depended upon by vast numbers
> of distros written by people other than those distros' maintainers.
>
> Since "testing against blead" programs take hours to run, I would like to
> have that time spent focusing on what I consider to be more relevant
> distros.
>
> For the 5.29.* development cycle starting in May of this year, I would
> like to be able to use a ranking of CPAN distros which goes beyond asking:
>
> * "How many other distributions depend on this one?"
>
> ... to asking:
>
> * "How many distributions by other authors/maintainers depend on this one?"
>
> Would that be feasible?  Has anyone attempted this already?
>
> Thank you very much.
> Jim Keenan
>



-- 
David Golden <x...@xdg.me> Twitter/IRC/GitHub: @xdg

Re: CPAN-river: can graph calculation be modified?

Reply via email to