On Wednesday 08 June 2016 11:47:46 L. A. Walsh wrote: > I tried: > > wget "http://translate.google.com/#ja/en/クイーンズブレイド・メインテーマB" > > But get a an Error "403: Forbidden" (tried w/ and w/o proxy) -- same. > > But cut/paste the same URL into IE11 or > PaleMoon (a 64-bit FF derivative), and it works. > > Any idea why or what I might do to get it to work?
Basically, from '#' on (fragment part of URL) nothing is relevant for the HTTP request. This is what Firefox 46 sends to localhost:8080 (I started a netcat 'nc -l -p 8080' to make sure). GET / HTTP/1.1 Host: localhost:8080 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive As you can see, the UTF-8 part is not relevant. If I do a 'telnet translate.google.com 80' and paste the above (just with 'Host: translate.google.com' and an empty line at the end): GET / HTTP/1.1 Host: translate.google.com User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive The answer is HTTP/1.1 302 Found Location: https://translate.google.com/ Date: Wed, 08 Jun 2016 19:41:14 GMT Expires: Wed, 08 Jun 2016 19:41:14 GMT Cache-Control: private, max-age=0 Content-Type: text/html; charset=UTF-8 Content-Language: en P3P: CP="This is not a P3P policy! See https://www.google.com/support/accounts/answer/151657?hl=en for more info." X-Content-Type-Options: nosniff Server: HTTP server (unknown) Content-Length: 226 X-XSS-Protection: 1; mode=block Set-Cookie: NID=79=RWJmTifLbUTlUm1FaGgoWgqajLS-- KpLfeevl5RaKlUp12ntFF3rfOBKvQiElhElP4CYe-5I2gZRYFJEytinX6ATW93FbhmdotpBNbWl8_aOg7AyUTnF57P8rDA0HgTL; expires=Thu, 08-Dec-2016 19:41:14 GMT; path=/; domain=.google.com; HttpOnly <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8"> <TITLE>302 Moved</TITLE></HEAD><BODY> <H1>302 Moved</H1> The document has moved <A HREF="https://translate.google.com/">here</A>. </BODY></HTML> Now, trying with wget -d <your URL from above>': GET / HTTP/1.1 User-Agent: Wget/1.17.1.42-42cc8 (linux-gnu) Accept: */* Accept-Encoding: identity Host: translate.google.com Connection: Keep-Alive The answer is HTTP/1.1 403 Forbidden Content-Type: text/html; charset=UTF-8 X-Content-Type-Options: nosniff Date: Wed, 08 Jun 2016 19:34:21 GMT Server: HTTP server (unknown) Cache-Control: private X-XSS-Protection: 1; mode=block Accept-Ranges: none Vary: Accept-Encoding Transfer-Encoding: chunked <body skipped> My guess is, that google does not like User-Agent 'wget', now trying with Firefox's User-Agent: $ wget -d -U "Mozilla/5.0 (X11; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0" http://translate.google.com And zack ... that works. Give it a try. Regards, Tim
signature.asc
Description: This is a digitally signed message part.