[ https://issues.apache.org/jira/browse/COMDEV-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebb resolved COMDEV-292. ------------------------- Resolution: Fixed The script has been updated to drop stats for lists that don't appear on the index page. This is probably as much as can be done here. > Mailglomper does not handle renamed lists well > ---------------------------------------------- > > Key: COMDEV-292 > URL: https://issues.apache.org/jira/browse/COMDEV-292 > Project: Community Development > Issue Type: Bug > Components: Reporter Tool > Reporter: Sebb > Priority: Major > > The mailglomper script does not take account of renamed mailing lists. > This can result in double counting the activity for a project. > For example, commits@libcloud was renamed to notifications@libcloud in March > 2014. > However the data in the maildata_extended.json file includes weekly epoch > entries > for commits: > 1507161600 2017-10-05 00:00:00 UTC > to > 1524096000 2018-04-19 00:00:00 UTC > whereas notifications has: > 1515024000 2018-01-04 00:00:00 UTC > to > 1531958400 2018-07-19 00:00:00 UTC > The weekly counts agree for the overlap period. > If the commits mbox files were still present up to April 2018, there would be > an index entry for the list, and if there was also a redirect in place, the > code would see the redirected files. > I think the code should probably ignore redirects if that's possible. > When a list is renamed, the old data ought to be dropped, otherwise it may be > double-counted. > Also the obsolete entries will gradually accumulate. > This applies to both the maildata_weekly.json and maildata_extended.json > files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@community.apache.org For additional commands, e-mail: dev-h...@community.apache.org