Package: wget
Version: 1.18-5+deb9u2
Severity: important
Tags: security
Tags: patch
Fixed: 1.20.1-1

Dear maintainer,

the 09-stretch version of wget --convert-links fails if when
encountering an embedded image and trying to parse this as a link.


How to repeat

Save the following file as "index.html" in the webroot of a web server
under your control. In the given example it's "localhost".

====================================================================
<html>
<head>
<title>title</title>
</head>
<body>
<img 
srcset="data:image/gif;base64,AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"/>
</body>
</html>
====================================================================

Run "wget --convert-links http://localhost/index.html --debug"


Observed:

| (...)
| Length: 161 [text/html]
| Saving to: ‘index.html’
| 
| index.html                  100%[=========================================>]  
   161  --.-KB/s    in 0s      
| 
| 2019-03-12 00:00:00 (12,1 MB/s) - ‘index.html’ saved [161/161]
| 
| Scanning index.html (from http://localhost/index.html)
| Loaded index.html (size 161).
| URI encoding = ‘UTF-8’
| index.html: merge(‘http://localhost/index.html’, 
‘data:image/gif;base64,AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA’)
 -> 
data:image/gif;base64,AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
| index.html: merged link 
"data:image/gif;base64,AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
 doesn't parse.
| Segmentation fault


Expected:

| (...)
| Length: 161 [text/html]
| Saving to: ‘index.html’
| 
| index.html                  100%[=========================================>]  
   161  --.-KB/s    in 0s      
| 
| 2019-03-12 00:00:00 (16,4 MB/s) - ‘index.html’ saved [161/161]
| 
| Scanning index.html (from http://localhost/index.html)
| Loaded index.html (size 161).
| URI encoding = ‘UTF-8’
| index.html: merge(‘http://localhost/index.html’, 
‘data:image/gif;base64,AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA’)
 -> 
data:image/gif;base64,AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
| index.html: merged link 
"data:image/gif;base64,AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
 doesn't parse.
| no-follow in index.html: 0
| Converting links in index.html... nothing to do.
| Converted links in 1 files in 0,001 seconds.

This was fixed in 10-buster/sid (1.20), and the change is fairly
simple, see attached patch. Please apply when convenient.

The 08-jessie version is not affected.

Cheers,
    Christoph


-- System Information:
Debian Release: 9.8
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'proposed-updates'), (500, 
'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.19.26 (SMP w/4 CPU cores)
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8), 
LANGUAGE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: unable to detect

Versions of packages wget depends on:
ii  libc6        2.24-11+deb9u4
ii  libgnutls30  3.5.8-5+deb9u4
ii  libidn11     1.33-1
ii  libnettle6   3.3-1+b2
ii  libpcre3     2:8.39-3
ii  libpsl5      0.17.0-3
ii  libuuid1     2.29.2-1+deb9u1
ii  zlib1g       1:1.2.8.dfsg-5

Versions of packages wget recommends:
ii  ca-certificates  20161130+nmu1+deb9u1

wget suggests no packages.

-- no debconf information

--- a/src/html-url.c
+++ b/src/html-url.c
@@ -729,8 +729,11 @@
                                             srcset + url_end);
               struct urlpos *up = append_url (url_text, base_ind + url_start,
                                               url_end - url_start, ctx);
-              up->link_inline_p = 1;
-              up->link_noquote_html_p = 1;
+              if (up)
+                {
+                  up->link_inline_p = 1;
+                  up->link_noquote_html_p = 1;
+                }
               xfree (url_text);
             }
 

Attachment: signature.asc
Description: PGP signature

Reply via email to