mikemccand commented on issue #106:
URL: 
https://github.com/apache/lucene-jira-archive/issues/106#issuecomment-1203799408

   OK I iterated some more on my silly "attempt to detected diff/patch/quoted 
code" to this VERY scratchy tool:
   
   ```
   import os
   import re
   import glob
   import json
   
   reDiffLines = re.compile('^[\d,]+[cd][\d,]+\s*$', re.MULTILINE)
   reDiffCommand = re.compile('\s+diff\s+[^\s]+')
   reLeadingLessGreaterThan = re.compile(r'^[<>] .*?\);.*?', re.MULTILINE)
   
   results = []
   
   for file_name in glob.glob('jira-dump/*.json'):
       s = open(file_name).read()
       d = json.loads(s)
       jira_id = d['key']
       if not jira_id.startswith('LUCENE'):
           print(f'SKIPPING: {jira_id}')
           continue
       fields = d['fields']
       #print(fields.keys())                                                    
                                                                                
                                                                        
       desc = fields['description']
       #if desc is not None and ('\n---\n' in desc or 
reDiffLines.search(desc)):                                                      
                                                                                
                  
       if desc is not None:
           count = len(reLeadingLessGreaterThan.findall(desc))
           if reDiffLines.search(desc) or count >= 2:
               #print(f'MATCH: {d["key"]}\n  {desc}')                           
                                                                                
                                                                        
               #print(f'MATCH: {d["key"]}\n')                                   
                                                                                
                                                                        
               results.append((jira_id, f'*** Diff in desc?: 
[{jira_id}](https://issues.apache.org/jira/browse/{jira_id})'))
               #print(repr(desc))                                               
                                                                                
                                                                        
               if jira_id == 'LUCENE-825':
                   print(reDiffCommand.match(desc))
                   print(repr(desc))
       for comment in fields['comment']['comments']:
           comment_text = comment['body']
           comment_id = comment['id']
           count = len(reLeadingLessGreaterThan.findall(comment_text))
           #if '\n---\n' in comment_text or reDiffLines.search(comment_text):   
                                                                                
                                                                        
           if reDiffLines.search(comment_text) or count > 2:
               #print(f'MATCH: {d["key"]}\n  {comment_text}')                   
                                                                                
                                                                        
               jira_comment_link = 
f'https://issues.apache.org/jira/browse/{jira_id}?focusedCommentId={comment_id}&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-{comment_id}'
               results.append((jira_id, f'*** Diff in comment?: 
[{jira_id}]({jira_comment_link})'))
               if jira_id == 'LUCENE-5400':
                   print(reDiffCommand.search(comment_text))
                   print(repr(comment_text))
               #print(repr(comment_text))                                       
                                                                                
                                                                        
   
   print(f'\n{len(results)} possible diffs:\n')
   for jira_id, text in sorted(results, key=lambda x: int(x[0][7:])):
       print(text)
   ```
   
   It produces these results:
   
   > 9 possible diffs:
   
   *** Diff in desc?: 
[LUCENE-108](https://issues.apache.org/jira/browse/LUCENE-108)
   *** Diff in desc?: 
[LUCENE-162](https://issues.apache.org/jira/browse/LUCENE-162)
   *** Diff in desc?: 
[LUCENE-327](https://issues.apache.org/jira/browse/LUCENE-327)
   *** Diff in comment?: 
[LUCENE-584](https://issues.apache.org/jira/browse/LUCENE-584?focusedCommentId=12487106&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12487106)
   *** Diff in comment?: 
[LUCENE-743](https://issues.apache.org/jira/browse/LUCENE-743?focusedCommentId=12520476&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12520476)
   *** Diff in desc?: 
[LUCENE-825](https://issues.apache.org/jira/browse/LUCENE-825)
   *** Diff in desc?: 
[LUCENE-5110](https://issues.apache.org/jira/browse/LUCENE-5110)
   *** Diff in comment?: 
[LUCENE-5400](https://issues.apache.org/jira/browse/LUCENE-5400?focusedCommentId=14136942&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14136942)
   *** Diff in comment?: 
[LUCENE-5934](https://issues.apache.org/jira/browse/LUCENE-5934?focusedCommentId=14128412&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14128412)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to