On Fri, Feb 10, 2017 at 1:46 AM, Brett Cannon <br...@python.org> wrote:
> As of right now Senthil is (I'm assuming) generating another test repo right
> now to see the results. If we're happy with the output then we will go with
> it, else we will skip the rewrite. So you're not too late as in a final
> decision has been made, but then you could argue you're too early based on
> how this test goes. :)
>

No need to wait, I put together a script that shows the result of the
rewriting :)
The script checks against roundup, so invalid ids are not converted.
I'm currently tweaking it as I find issues, so it's still a bit messy,
but this should give you a good idea.

Drop the attached script in the root of your cpython clone and run it with:
  python convcm.py (to see 100 random converted commit messages)
  python convcm.py csid1 csid2 (to see the specified changeset ids)
The script will create an output.html file that shows the before/after.
It will also cache the valid ids in a valid_ids.json files.

There are still a few ambiguous cases, in particular:
* I left "SF", so "SF issue #12345" becomes "SF bpo-12345";
* I left "patch", so "Apply patch #12345" becomes "Apply patch bpo-12345";

Senthil is also testing the conversion using this regex.

Best Regards,
Ezio Melotti
from __future__ import print_function

import re
import sys
import json
import random
import xmlrpclib

from mercurial import ui, hg

u = ui.ui()
repo = hg.repository(ui.ui(), '.')


print('Retrieving bpo ids    ...', end=' ')
try:
    with open('valid_ids.json') as f:
        valid_ids = json.load(f)
except IOError:
    url = 'http://bugs.python.org/xmlrpc'
    roundup = xmlrpclib.ServerProxy(url)
    valid_ids = roundup.list('issue', 'id')
    with open('valid_ids.json', 'w') as f:
        json.dump(valid_ids, f)
valid_ids = set(valid_ids)
print('[done]')

r = r'(?:(?:(?<!org/)issues?|bugs?|SF)(?:\s+id)?\s*#?|#)\s*(\d+)'
regex = re.compile(r, flags=re.MULTILINE|re.IGNORECASE)


N = 100

if len(sys.argv) == 1:
    print('Generating revs sample...', end=' ')
    revs = random.sample(repo, N)
    print('[done]')
else:
    revs = [repo[rev] for rev in sys.argv[1:]]

# uncomment to run on all the revs
#revs = [repo[rev] for rev in repo]

def re_cb(match):
    id = match.group(1)
    if id not in valid_ids:
        return match.group(0)
    else:
        return '<b>bpo-%s</b>' % id


print('Generating output     ...', end=' ')
with open('output.html', 'w') as f:
    for n, rev in enumerate(revs):
        desc = repo[rev].description()
        newdesc = regex.sub(re_cb, desc)
        f.write('<pre>[<b>%s</b>]: %s\n' % (rev, desc))
        f.write('[<b>%s</b>]: %s</pre>\n<hr>\n' % (rev, newdesc))
print('[done]')
print(n+1, 'revisions converted')


unusual = """
af811172717d 76a9a5131aae 31342913fb1e c4dd30b5d07e 3094843e7b92 e8940d4cd8ca 6d1e8162e855 27b698395d35 077d29384399 81ce9d412a4c c6df85e1d42e fedd6ccc5e5b 4ca32e4f7839 6db0a62b6aa6 69ac672b49b3 a6bcf4df1a85 a74b463bf76d 756c27efe193 df4943d24cb6 04f2801d9977 d9d69060f5e4
"""
ambiguous = """
0e8077cb3dd5 04f2801d9977 76a9a5131aae 3094843e7b92 e8940d4cd8ca fedd6ccc5e5b a4d869ecef33 bd2aa0247ada
"""
invalid = """
d2ae5affde14 329b28a85947
"""
broken = """
"""
_______________________________________________
core-workflow mailing list
core-workflow@python.org
https://mail.python.org/mailman/listinfo/core-workflow
This list is governed by the PSF Code of Conduct: 
https://www.python.org/psf/codeofconduct

Reply via email to