jenkins-bot has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/329761 )

Change subject: Fix and improve default regexes
......................................................................


Fix and improve default regexes

- Remove superfluous flags.
- Clean up 'header' using multiline.
- Expand 'pre' and 'table' to support HTML attributes (mostly 'style').
- Update 'property' to support parameters (currently, it supports
  "|from=" but it might support more in the future).
- Localize 'property' and 'invoke' using magic words.
- Add singleline to 'invoke'.

Change-Id: Ib805bf70cb1cc99711138d7d6c7e40971f31b602
---
M pywikibot/textlib.py
1 file changed, 10 insertions(+), 8 deletions(-)

Approvals:
  jenkins-bot: Verified
  Xqt: Looks good to me, approved



diff --git a/pywikibot/textlib.py b/pywikibot/textlib.py
index 9f7782e..dce6608 100644
--- a/pywikibot/textlib.py
+++ b/pywikibot/textlib.py
@@ -221,13 +221,13 @@
     _regex_cache.update({
         'comment':      re.compile(r'(?s)<!--.*?-->'),
         # section headers
-        'header':       re.compile(r'\r?\n=+.+=+ *\r?\n'),
+        'header':       re.compile(r'(?m)^=+.+=+ *$'),
         # preformatted text
-        'pre':          re.compile(r'(?ism)<pre>.*?</pre>'),
+        'pre':          re.compile(r'(?is)<pre[ >].*?</pre>'),
         'source':       re.compile(r'(?is)<source .*?</source>'),
-        'score':        re.compile(r'(?ism)<score[ >].*?</score>'),
+        'score':        re.compile(r'(?is)<score[ >].*?</score>'),
         # inline references
-        'ref':          re.compile(r'(?ism)<ref[ >].*?</ref>'),
+        'ref':          re.compile(r'(?is)<ref[ >].*?</ref>'),
         'template':     NESTED_TEMPLATE_REGEX,
         # lines that start with a space are shown in a monospace font and
         # have whitespace preserved.
@@ -235,7 +235,7 @@
         # tables often have whitespace that is used to improve wiki
         # source code readability.
         # TODO: handle nested tables.
-        'table':        re.compile(r'(?ims)^{\|.*?^\|}|<table>.*?</table>'),
+        'table':        re.compile(r'(?ims)^{\|.*?^\|}|<table[ >].*?</table>'),
         'hyperlink':    compileLinkR(),
         'gallery':      re.compile(r'(?is)<gallery.*?>.*?</gallery>'),
         # this matches internal wikilinks, but also interwiki, categories, and
@@ -247,11 +247,13 @@
                              site.validLanguageLinks() +
                              list(site.family.obsolete.keys()))),
         # Wikibase property inclusions
-        'property':     re.compile(r'(?i)\{\{\s*#property:\s*p\d+\s*\}\}'),
+        'property':     (r'(?i)\{\{\s*\#(?:%s):\s*p\d+.*?\}\}',
+                         lambda site: 
'|'.join(site.getmagicwords('property'))),
         # Module invocations (currently only Lua)
-        'invoke':       re.compile(r'(?i)\{\{\s*#invoke:.*?}\}'),
+        'invoke':       (r'(?is)\{\{\s*\#(?:%s):.*?\}\}',
+                         lambda site: '|'.join(site.getmagicwords('invoke'))),
         # categories
-        'category':     ('\[\[ *(?:%s)\s*:.*?\]\]',
+        'category':     (r'\[\[ *(?:%s)\s*:.*?\]\]',
                          lambda site: '|'.join(site.namespaces[14])),
         # files
         'file':         (FILE_LINK_REGEX,

-- 
To view, visit https://gerrit.wikimedia.org/r/329761
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ib805bf70cb1cc99711138d7d6c7e40971f31b602
Gerrit-PatchSet: 5
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: Matěj Suchánek <matejsuchane...@gmail.com>
Gerrit-Reviewer: Dalba <dalba.w...@gmail.com>
Gerrit-Reviewer: Ladsgroup <ladsgr...@gmail.com>
Gerrit-Reviewer: Magul <tomasz.magul...@gmail.com>
Gerrit-Reviewer: Matěj Suchánek <matejsuchane...@gmail.com>
Gerrit-Reviewer: Xqt <i...@gno.de>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to