On 6/8/11 4:36 PM, Vikraman wrote:
> I'm working on the `Package statistics` project this year. Till now, I
> have managed to write a client and server[0] to collect the following
> information from hosts:

Excellent, good luck with the idea! I think that better information
about how Gentoo is actually used will greatly help improving it.

> Is there a need to collect files installed by a package ? Doesn't PFL[1]
> already provide that ?

Well, PFL is not an official Gentoo project. It might be useful, but I
wouldn't say it's a priority.

> Please provide some feedback on what other data should be collected, etc.

In my opinion it's *not* about collecting as much data as possible. I
think it's most important to get the core functionality working really
well, and convincing as large percentage of users as possible to enable
reporting the statistics (to make the results - hopefully - accurately
represent the user base). Please note that in some cases it may mean
collecting _less_ data, or thinking more about the privacy of the users.

For me, as a developer, even a list of packages sorted by popularity
(aka Debian/Ubuntu popcon) would be very useful.

Ah, and maybe files in /etc/portage: package.keywords and so on. It
could be useful to see what people are masking/unmasking, that may be an
indication of stale stabilizations or brokenness hitting the tree.
Anyway, I'd call it an enhancement.

> Also, I'm starting work on the webUI, and would like some
> recommendations for stats pages, such as:
> 
> * Packages installed sorted by users

Cool!

> * Top arches, keywords, profiles

And percentage of ~arch vs arch users?

> * Most enabled, disabled useflags per package/globally

Also great, especially the per-package variant. It'd be also useful to
have per-profile data, to better tune the profile defaults.

> [0]
> http://git.overlays.gentoo.org/gitweb/?p=proj/gentoostats.git;a=commit;h=1b9697a090515d2a373e83b1094d6e08ec405c02

I took a quick look at the code. Some random comments:

- it uses portage Python API a lot. But it's not stable, or at least not
guaranteed to be stable. Have you considered using helpers like portageq
(or eventually enhancing those helpers)?

- make the licensing super-clear (a LICENSE file, possibly some header
in every source file, and so on)

- how about submitting the data over HTTPS and not HTTP to better help
privacy?

- don't leave exception handling as a TODO; it should be a part of your
design, not an afterthought

- instead of or in addition to the setup.txt file, how about just
writing the real setup.py file for distutils?

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to