On Wed 22-Feb-17 19:38, Thomas Dickey wrote: > On Wed, Feb 22, 2017 at 10:32:24PM +0200, Dimitrios Semitsoglou-Tsiapos wrote: > > Greetings Lynx developers and users! > > > > I have noticed that in `-dump` mode lynx will percent-encode reserved > > characters in the "list of links" if `-display_charset=UTF-8` is set (or > > perhaps any value other than ISO-8859-1). This can cause some URLs to > > effectively break. > > > > Would it perhaps be correct to simply ignore `display_charset` while > > printing these URLs? > > not really - it's generating the file (not passing it on), and is > using a known encoding. >
I am probably misinterpreting the problem, so I will give an example. I have received email from ebay where they encode URLs multiple times within all their links. For example, here's three successive (but not necessarily consecutive) chunks of a single URL: HTML source lynx -dump --------------------------------- --------------------------- http://rover.ebay.com http://rover.ebay.com https%3A%2F%2Fsvcs.ebay.com https://svcs.ebay.com L%252B L%2B http%253A%252F%252Frover.ebay.com http%3A%2F%2Frover.ebay.com >From those I have come up with a minimal example (they probably encode too much personal information in their arguments for me to upload the whole URL). # Verify the example URL redirects to their home page: $ url='https://svcs.ebay.com/delstats/email/location?ch=7%26di=12345' $ lynx -dump "$url" | head -1 #[1]alternate [2]alternate [3]alternate [4]alternate [5]alternate5 # Verify opening the URL from within lynx works $ echo '<a href='"$url"'>click me</a>' > /tmp/foo.html $ lynx /tmp/foo.html # Now press return # Dump this working file: $ lynx -dump /tmp/foo.html [1]click me References 1. https://svcs.ebay.com/delstats/email/location?ch=7&di=12345 # Try to open the resulting URL: $ lynx 'https://svcs.ebay.com/delstats/email/location?ch=7&di=12345' Looking up svcs.ebay.com Making HTTPS connection to svcs.ebay.com SSL callback:ok, preverify_ok=1, ssl_okay=0 SSL callback:ok, preverify_ok=1, ssl_okay=0 SSL callback:ok, preverify_ok=1, ssl_okay=0 Verified connection to svcs.ebay.com (cert=svcs.ebay.com) Certificate issued by: /C=US/O=Symantec Corporation/OU=Symantec Trust Network/CN=Symantec Class 3 Secure Server CA - G4 Secure 128-bit TLSv1/SSLv3 (AES128-GCM-SHA256) HTTP connection Sending HTTP request. HTTP request sent; waiting for response. HTTP/1.0 307 Temporary Redirect 'A'lways allowing from domain '.ebay.com'. Alert!: Got redirection with no Location header. Data transfer complete /bin/gzip -d --no-name /tmp/lynxXXXXuQX0AJ/L15600-7565TMP.bin.gz Using file://localhost/tmp/lynxXXXXuQX0AJ/L15600-7565TMP.bin hexdump '/tmp/lynxXXXXuQX0AJ/L15600-7565TMP.bin' lynx: Start file could not be found or is not text/html or text/plain Exiting... # This is in fact the error I get when opening a real dumped URL. # Now dump with ISO-8859-1: $ lynx -display_charset=ISO-8859-1 -dump /tmp/foo.html [1]click me References 1. https://svcs.ebay.com/delstats/email/location?ch=7%26di=12345 # The resulting URL works as expected. Would ebay be at fault here (for their encoding or server handling), lynx, or I for using the dumped URL directly? _______________________________________________ Lynx-dev mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/lynx-dev
