Hi Steven,
Am Dienstag, 31. März 2015, 18:11:58 schrieb Stephen Wells:
> Dear all - I am currently trying to use wget to obtain mp3 files from the
> Google Translate TTS system. In principle this can be done using:
>
> wget -U Mozilla -O "${string}.mp3" "
> http://translate.google.com/translate_tts?tl=TL&q=${string}"
>
> where TL is a twoletter language code (en,fr,de and so on).
>
> However I am meeting a serious error when I try to send Russian strings
> (tl=ru) in Cyrillic characters. I'm working in a UTF-8 environment (under
> Cygwin) and the file system will display the cyrillic strings no problem.
> If I provide a command like this:
>
> http://translate.google.com/translate_tts?tl=ru&q=мазать
>
> wget incorrectly processes the Cyrillic characters _before_ sending the
> http request, so what it actually requests is:
>
> http://translate.google.com/translate_tts?tl=ru&q=%D0%BC%D0%B0%D0%B7%D0%B0%D
> 1%82%D1%8CThis seems to be the correct behavior of a web client. The URL in the GET request is transmitted UTF-8 encoded and percent escaping is performed for chars >127 (not mentioning control chars here). > This of course produces a string of gibberish in the resulting mp3 file! This is something different. If you are talking about the file name, well there is --restrict-file-names=nocontrol. Did you give it a try ? > Is there any way to make wget actually send the string it is given, instead > of mangling it on the way out? This is really blocking me. From what you write, I am unsure if you are talking about the resulting file name or about HTTP URL encoding in a GET request. Regards, Tim
signature.asc
Description: This is a digitally signed message part.
