mikemccand commented on issue #106: URL: https://github.com/apache/lucene-jira-archive/issues/106#issuecomment-1203799408
OK I iterated some more on my silly "attempt to detected diff/patch/quoted code" to this VERY scratchy tool: ``` import os import re import glob import json reDiffLines = re.compile('^[\d,]+[cd][\d,]+\s*$', re.MULTILINE) reDiffCommand = re.compile('\s+diff\s+[^\s]+') reLeadingLessGreaterThan = re.compile(r'^[<>] .*?\);.*?', re.MULTILINE) results = [] for file_name in glob.glob('jira-dump/*.json'): s = open(file_name).read() d = json.loads(s) jira_id = d['key'] if not jira_id.startswith('LUCENE'): print(f'SKIPPING: {jira_id}') continue fields = d['fields'] #print(fields.keys()) desc = fields['description'] #if desc is not None and ('\n---\n' in desc or reDiffLines.search(desc)): if desc is not None: count = len(reLeadingLessGreaterThan.findall(desc)) if reDiffLines.search(desc) or count >= 2: #print(f'MATCH: {d["key"]}\n {desc}') #print(f'MATCH: {d["key"]}\n') results.append((jira_id, f'*** Diff in desc?: [{jira_id}](https://issues.apache.org/jira/browse/{jira_id})')) #print(repr(desc)) if jira_id == 'LUCENE-825': print(reDiffCommand.match(desc)) print(repr(desc)) for comment in fields['comment']['comments']: comment_text = comment['body'] comment_id = comment['id'] count = len(reLeadingLessGreaterThan.findall(comment_text)) #if '\n---\n' in comment_text or reDiffLines.search(comment_text): if reDiffLines.search(comment_text) or count > 2: #print(f'MATCH: {d["key"]}\n {comment_text}') jira_comment_link = f'https://issues.apache.org/jira/browse/{jira_id}?focusedCommentId={comment_id}&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-{comment_id}' results.append((jira_id, f'*** Diff in comment?: [{jira_id}]({jira_comment_link})')) if jira_id == 'LUCENE-5400': print(reDiffCommand.search(comment_text)) print(repr(comment_text)) #print(repr(comment_text)) print(f'\n{len(results)} possible diffs:\n') for jira_id, text in sorted(results, key=lambda x: int(x[0][7:])): print(text) ``` It produces these results: > 9 possible diffs: *** Diff in desc?: [LUCENE-108](https://issues.apache.org/jira/browse/LUCENE-108) *** Diff in desc?: [LUCENE-162](https://issues.apache.org/jira/browse/LUCENE-162) *** Diff in desc?: [LUCENE-327](https://issues.apache.org/jira/browse/LUCENE-327) *** Diff in comment?: [LUCENE-584](https://issues.apache.org/jira/browse/LUCENE-584?focusedCommentId=12487106&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12487106) *** Diff in comment?: [LUCENE-743](https://issues.apache.org/jira/browse/LUCENE-743?focusedCommentId=12520476&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12520476) *** Diff in desc?: [LUCENE-825](https://issues.apache.org/jira/browse/LUCENE-825) *** Diff in desc?: [LUCENE-5110](https://issues.apache.org/jira/browse/LUCENE-5110) *** Diff in comment?: [LUCENE-5400](https://issues.apache.org/jira/browse/LUCENE-5400?focusedCommentId=14136942&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14136942) *** Diff in comment?: [LUCENE-5934](https://issues.apache.org/jira/browse/LUCENE-5934?focusedCommentId=14128412&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14128412) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org