Gitweb links:
...log
http://git.netsurf-browser.org/libhubbub.git/shortlog/0eb6188c3a931063f78b017c621b79709746706e
...commit
http://git.netsurf-browser.org/libhubbub.git/commit/0eb6188c3a931063f78b017c621b79709746706e
...tree
http://git.netsurf-browser.org/libhubbub.git/tree/0eb6188c3a931063f78b017c621b79709746706e
The branch, master has been updated
via 0eb6188c3a931063f78b017c621b79709746706e (commit)
from 73071c0dea1e4bcfd094810d051aebc74e6c648c (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
commitdiff
http://git.netsurf-browser.org/libhubbub.git/commit/?id=0eb6188c3a931063f78b017c621b79709746706e
commit 0eb6188c3a931063f78b017c621b79709746706e
Author: Daniel Silverstone <[email protected]>
Commit: Daniel Silverstone <[email protected]>
Support falling back to space separated charset
In some cases, for example, Apple Mail, programs generate HTML
with apallingly bad meta tags such as:
<meta content="text/html charset=utf-8">
This is bad because *a* no http-equiv="Content-Type" and *b* because
the content type and charset do not have a separating semi-colon.
Sadly, Chrome et-al support this, so we need to in Hubbub. This
change adjusts the content="" parser to retry if it cannot find
a semicolon, and work forwards to first whitespace instead.
Fixes: #2549
diff --git a/src/charset/detect.c b/src/charset/detect.c
index 93cbe63..d2d6816 100644
--- a/src/charset/detect.c
+++ b/src/charset/detect.c
@@ -369,6 +369,7 @@ uint16_t hubbub_charset_parse_attributes(const uint8_t
**pos,
uint16_t hubbub_charset_parse_content(const uint8_t *value,
uint32_t valuelen)
{
+ const uint8_t *restart = value;
const uint8_t *end;
const uint8_t *tentative = NULL;
uint32_t tentative_len = 0;
@@ -388,8 +389,22 @@ uint16_t hubbub_charset_parse_content(const uint8_t *value,
value++;
}
- if (value >= end)
- return 0;
+ if (value >= end) {
+ /* Fallback, no semicolon, try for first whitespace */
+ value = restart;
+ while (value < end) {
+ /* This condition is odd, because ISSPACE() includes
+ * forward slash, which we need to skip so that content
+ * types work properly.
+ */
+ if (ISSPACE(*value) && (*value != '/')) {
+ value++;
+ break;
+ }
+
+ value++;
+ }
+ }
/* 2 */
while (value < end && ISSPACE(*value)) {
-----------------------------------------------------------------------
Summary of changes:
src/charset/detect.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/src/charset/detect.c b/src/charset/detect.c
index 93cbe63..d2d6816 100644
--- a/src/charset/detect.c
+++ b/src/charset/detect.c
@@ -369,6 +369,7 @@ uint16_t hubbub_charset_parse_attributes(const uint8_t
**pos,
uint16_t hubbub_charset_parse_content(const uint8_t *value,
uint32_t valuelen)
{
+ const uint8_t *restart = value;
const uint8_t *end;
const uint8_t *tentative = NULL;
uint32_t tentative_len = 0;
@@ -388,8 +389,22 @@ uint16_t hubbub_charset_parse_content(const uint8_t *value,
value++;
}
- if (value >= end)
- return 0;
+ if (value >= end) {
+ /* Fallback, no semicolon, try for first whitespace */
+ value = restart;
+ while (value < end) {
+ /* This condition is odd, because ISSPACE() includes
+ * forward slash, which we need to skip so that content
+ * types work properly.
+ */
+ if (ISSPACE(*value) && (*value != '/')) {
+ value++;
+ break;
+ }
+
+ value++;
+ }
+ }
/* 2 */
while (value < end && ISSPACE(*value)) {
--
HTML5 parser library
_______________________________________________
netsurf-commits mailing list
[email protected]
http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/netsurf-commits-netsurf-browser.org