MtDu has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/265769

Change subject: Make reflinks only fetch 1 Mb of each linked document
......................................................................

Make reflinks only fetch 1 Mb of each linked document

Bug: T124138
Change-Id: Icd83a1c59e8451bfb5a44fbdad45697c3847e47d
---
M scripts/reflinks.py
1 file changed, 3 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/pywikibot/core 
refs/changes/69/265769/1

diff --git a/scripts/reflinks.py b/scripts/reflinks.py
index be630cb..a5594b6 100755
--- a/scripts/reflinks.py
+++ b/scripts/reflinks.py
@@ -523,7 +523,7 @@
                 f = None
 
                 try:
-                    f = requests.get(ref.url, headers=headers, timeout=60)
+                    f = requests.get(ref.url, headers=headers, timeout=60, 
stream=True)
 
                     # Try to get Content-Type from server
                     contentType = f.headers.get('content-type')
@@ -582,7 +582,8 @@
                             new_text = new_text.replace(match.group(), repl)
                         continue
 
-                    linkedpagetext = f.content
+                    # Read the first 1,000,000 bytes (0.95 MB)
+                    linkedpagetext = f.iter_content(1000000)
                 except UnicodeError:
                     # example : 
http://www.adminet.com/jo/20010615¦/ECOC0100037D.html
                     # in [[fr:Cyanure]]

-- 
To view, visit https://gerrit.wikimedia.org/r/265769
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Icd83a1c59e8451bfb5a44fbdad45697c3847e47d
Gerrit-PatchSet: 1
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: MtDu <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to