jenkins-bot has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/365281 )

Change subject: Support url datatype in harvest_template.py
......................................................................


Support url datatype in harvest_template.py

Use Pywikibot's native regex for matching external links.
Shame this is implemented in 2017.

Change-Id: I1645857a5eb8765d9eff1909f51b51035fb2396d
---
M scripts/harvest_template.py
1 file changed, 7 insertions(+), 1 deletion(-)

Approvals:
  jenkins-bot: Verified
  Xqt: Looks good to me, approved



diff --git a/scripts/harvest_template.py b/scripts/harvest_template.py
index b9954af..4957e39 100755
--- a/scripts/harvest_template.py
+++ b/scripts/harvest_template.py
@@ -52,7 +52,7 @@
 signal.signal(signal.SIGINT, _signal_handler)
 
 import pywikibot
-from pywikibot import pagegenerators as pg, WikidataBot
+from pywikibot import pagegenerators as pg, WikidataBot, textlib
 
 docuReplacements = {'&params;': pywikibot.pagegenerators.parameterHelp}
 
@@ -80,6 +80,7 @@
         self.fields = fields
         self.cacheSources()
         self.templateTitles = self.getTemplateSynonyms(self.templateTitle)
+        self.linkR = textlib.compileLinkR()
 
     def getTemplateSynonyms(self, title):
         """Fetch redirects of the title, so we can check against them."""
@@ -185,6 +186,11 @@
                                 claim.setTarget(linked_item)
                             elif claim.type in ('string', 'external-id'):
                                 claim.setTarget(value.strip())
+                            elif claim.type == 'url':
+                                match = self.linkR.search(value)
+                                if not match:
+                                    continue
+                                claim.setTarget(match.group('url'))
                             elif claim.type == 'commonsMedia':
                                 commonssite = pywikibot.Site('commons',
                                                              'commons')

-- 
To view, visit https://gerrit.wikimedia.org/r/365281
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I1645857a5eb8765d9eff1909f51b51035fb2396d
Gerrit-PatchSet: 1
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: Matěj Suchánek <matejsuchane...@gmail.com>
Gerrit-Reviewer: Dalba <dalba.w...@gmail.com>
Gerrit-Reviewer: John Vandenberg <jay...@gmail.com>
Gerrit-Reviewer: Xqt <i...@gno.de>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to