XZise has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/158848

Change subject: [FIX] Http: Allow custom encoding (or none)
......................................................................

[FIX] Http: Allow custom encoding (or none)

If a non textual file is requested via httplib2, it shouldn't be
encoded. This also adds the ability to use a specific encoding and
warns if the encoding differs.

Change-Id: I03609e3d6ec9d8b7f72819358c11d62a792bf4c0
---
M pywikibot/comms/http.py
1 file changed, 21 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/pywikibot/core 
refs/changes/48/158848/1

diff --git a/pywikibot/comms/http.py b/pywikibot/comms/http.py
index 25af2a1..239117e 100644
--- a/pywikibot/comms/http.py
+++ b/pywikibot/comms/http.py
@@ -24,6 +24,7 @@
 import sys
 import atexit
 import time
+import codecs
 
 # Verify that a working httplib2 is present.
 try:
@@ -204,7 +205,7 @@
     return formatted
 
 
-def request(site, uri, ssl=False, *args, **kwargs):
+def request(site, uri, ssl=False, encoding=True, *args, **kwargs):
     """Queue a request to be submitted to Site.
 
     All parameters not listed below are the same as
@@ -217,8 +218,12 @@
     @param site: The Site to connect to
     @param uri: the URI to retrieve (relative to the site's scriptpath)
     @param ssl: Use HTTPS connection
-    @return: The received data (a unicode string).
-
+    @param encoding: Either a valid encoding (usable for str.encode()) or True
+        to automatically chose the encoding from the returned header (defaults
+        to ASCII) or False to don't encode.
+    @return: The received data
+    @rtype: a unicode string, if encoding is not False, str (Python 2) or
+        bytes (Python 3) otherwise
     """
     if site:
         if ssl:
@@ -261,10 +266,20 @@
     pos = request.data[0]['content-type'].find('charset=')
     if pos >= 0:
         pos += len('charset=')
-        encoding = request.data[0]['content-type'][pos:]
-    else:
+        header_encoding = request.data[0]['content-type'][pos:]
+        if encoding is True:
+            encoding = header_encoding
+        elif codecs.lookup(encoding) != codecs.lookup(header_encoding):
+            pywikibot.warning(u'Encoding "{0}" requested but "{1}" recieved in 
'
+                               'the header.'.format(encoding, header_encoding))
+    elif encoding is True:
         encoding = 'ascii'
         # Don't warn, many pages don't contain one
         pywikibot.log(u"Http response doesn't contain a charset.")
 
-    return request.data[1].decode(encoding)
+    if encoding is not False:
+        data = request.data[1].decode(encoding)
+    else:
+        data = request.data[1]
+
+    return data

-- 
To view, visit https://gerrit.wikimedia.org/r/158848
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I03609e3d6ec9d8b7f72819358c11d62a792bf4c0
Gerrit-PatchSet: 1
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: XZise <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to