Your message dated Thu, 28 Nov 2013 23:45:51 +0100
with message-id <20131128224551.GA27705@yellowpig>
and subject line Re: [Popcon-developers] Bug#730620: popularity-contest: Please
improve machine readability of the raw data
has caused the Debian Bug report #730620,
regarding popularity-contest: Please improve machine readability of the raw data
to be marked as done.
This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.
(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)
--
730620: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=730620
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: popularity-contest
Severity: normal
Hi,
I'm trying to parse the raw statistics available on
http://popcon.debian.org/ and ran into some problems.
Firstly, while it is fine that the fields are separated by multiple
spaces, it should then not be the case that field values contain spaces
themselves. Unfortunately this is the case for example for the package
name "Not in Sid". This is a similar request as in bug report #574743
which asks for sanitizing the package names before putting it into the
statistics.
Second of all, going together with package name sanitization (which, as
above example shows can make the data unparsable) some obvious bogus
entries can be entirely removed like the "Not in Sid" example from
above. There exists no such package. If you want to include the
information then better do it in a commented line as you do for the
header of the file for which you use # as a comment character.
Thirdly, at the end of the file there is one large line only consisting
of minus characters. Can this line also not be commented with a #?
The same goes for the very last line which presents a total. Firstly it
is not necessary to put a "rank" on this line (it has the same rank as
the last entry) but it is also not necessary to have this line at all
because any machine parsing the rest of the file can easily generate it.
If you want this line for human consumption, you can just simply prefix
it with a # to make it a comment.
Would you welcome a patch fixing these issues?
cheers, josch
--- End Message ---
--- Begin Message ---
On Wed, Nov 27, 2013 at 03:31:38PM +0100, Johannes Schauer wrote:
> Hi,
>
> Quoting Bill Allombert (2013-11-27 13:47:26)
> > All of this is irrelevant: what actual problem are you trying to solve ?
>
> I need the popcon values per source package.
Since the popcon reports do not include the name of the source packages,
it is generated by crossing the raw popcon data with the Package files
in a Debian archive mirror.
This is not perfect (we do not know the version of the package, which is
important when source package are renamed), and in particular, if a package is
not in the Debian archive, we cannot find its source package, so we report it
as 'Not in Sid'.
> > I can write a 10-line perl script to parse each of them. Maybe other people
> > have done so and changing the format will cause them trouble. So there is a
> > need for a clear benefit before changing the format.
>
> My workaround is to split each line at the spaces and discard all lines which
> split into a different amount than seven elements.
Yes, this is a good solution.
You should also stop reading when the line start by '-' to avoid the 'total'
line.
> If you do not deem it worthwhile to make these kind of workarounds unnecessary
> by just making sure that delimiter characters do not appear in field values,
> then just close this bug report.
Having spaces in 'Not in Sid' prevent scripts to read it a true source package
name. The ---- at the end allows to stop before processing the 'Total' line.
The total line is useful for human reader that do addition slowly.
Thanks for your interest in popularity-contest!
--
Bill. <[email protected]>
Imagine a large red swirl here.
--- End Message ---