XZise has uploaded a new change for review.
https://gerrit.wikimedia.org/r/158848
Change subject: [FIX] Http: Allow custom encoding (or none)
......................................................................
[FIX] Http: Allow custom encoding (or none)
If a non textual file is requested via httplib2, it shouldn't be
encoded. This also adds the ability to use a specific encoding and
warns if the encoding differs.
Change-Id: I03609e3d6ec9d8b7f72819358c11d62a792bf4c0
---
M pywikibot/comms/http.py
1 file changed, 21 insertions(+), 6 deletions(-)
git pull ssh://gerrit.wikimedia.org:29418/pywikibot/core
refs/changes/48/158848/1
diff --git a/pywikibot/comms/http.py b/pywikibot/comms/http.py
index 25af2a1..239117e 100644
--- a/pywikibot/comms/http.py
+++ b/pywikibot/comms/http.py
@@ -24,6 +24,7 @@
import sys
import atexit
import time
+import codecs
# Verify that a working httplib2 is present.
try:
@@ -204,7 +205,7 @@
return formatted
-def request(site, uri, ssl=False, *args, **kwargs):
+def request(site, uri, ssl=False, encoding=True, *args, **kwargs):
"""Queue a request to be submitted to Site.
All parameters not listed below are the same as
@@ -217,8 +218,12 @@
@param site: The Site to connect to
@param uri: the URI to retrieve (relative to the site's scriptpath)
@param ssl: Use HTTPS connection
- @return: The received data (a unicode string).
-
+ @param encoding: Either a valid encoding (usable for str.encode()) or True
+ to automatically chose the encoding from the returned header (defaults
+ to ASCII) or False to don't encode.
+ @return: The received data
+ @rtype: a unicode string, if encoding is not False, str (Python 2) or
+ bytes (Python 3) otherwise
"""
if site:
if ssl:
@@ -261,10 +266,20 @@
pos = request.data[0]['content-type'].find('charset=')
if pos >= 0:
pos += len('charset=')
- encoding = request.data[0]['content-type'][pos:]
- else:
+ header_encoding = request.data[0]['content-type'][pos:]
+ if encoding is True:
+ encoding = header_encoding
+ elif codecs.lookup(encoding) != codecs.lookup(header_encoding):
+ pywikibot.warning(u'Encoding "{0}" requested but "{1}" recieved in
'
+ 'the header.'.format(encoding, header_encoding))
+ elif encoding is True:
encoding = 'ascii'
# Don't warn, many pages don't contain one
pywikibot.log(u"Http response doesn't contain a charset.")
- return request.data[1].decode(encoding)
+ if encoding is not False:
+ data = request.data[1].decode(encoding)
+ else:
+ data = request.data[1]
+
+ return data
--
To view, visit https://gerrit.wikimedia.org/r/158848
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I03609e3d6ec9d8b7f72819358c11d62a792bf4c0
Gerrit-PatchSet: 1
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: XZise <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits