On Sat, 4 Mar 2000, Ivan Zakharyaschev wrote:
>
> Fisrt, I go to http://sun.ru and then to
> http://sun.ru/developers/index.html.
>
> Step 1. I start like this:
>
> $ lynx --assume-charset windows-1251 sun.ru
>
> The displayed is page is OK (readable). The O)ptions may be checked now:
>
> Display charset: Cyrillic (koi8-r)
> Assumed document charset: windows-1251
> Raw 8-bit: OFF
>
> The info (shown with '=') includes a few interesting lines:
>
> Charset: koi8-r (assumed)
> Cache Control: no cache
>
> (It's strange, isn't it?)
Yes, it's strange in two respects:
1. the no-cache - I don't get anything like that (but I was not testing
with exactly your version).
2. The charset, which contradicts what lynx has really assumed (otherwise
the page wouldn't be readable).
I think I found the explanation for 2. - see below.
> Step 2. I press 'x' on a link called "For developers". The associated
(Since you hadn't loaded <http://sun.ru/developers/index.html> before, it
doesn't make a difference whether you use 'x' or Enter. At least it
shouldn't, and I don't know of anything in the code that would cause a
difference here.)
> document is loaded, and this time the characters are not shown correctly.
> The options and document info (charset and cache-control) remain the same
> as on the main (first) page.
>
> Step 3. After R)eloading the document is rendered correctly. The info and
> the options again remain the same.
Thanks for the details.
> KW> Also, you were using 2.8.3dev.14, and there have been significant
> KW> changes in this area since then (esp. wrt source cache). Could you
> KW> get the latest devel code from <http://lynx.isc.org/current/> and
> KW> check whether the same problem still exists.
>
> I will do it soon and tell you the results.
Please try the following (preferably with the latest version).
Make two changes in HTML.c:
a. Find the line
int dest_char_set = UCLYhndl_for_unrec;
and replace it with
int dest_char_set = -1;
b. Find the line
if (dest) {
and replace it with
if (dest && dest_char_set >= 0) {
This (partially) reverts code that is meant to handle the ACCEPT-CHARSET
attribute on A tags to the original state; it seems a logical error was
introduced quite a while ago (since at least 2.8.1), but this error shows
up only when ASSUME_UNREC_CHARSET or -assume_unrec_charset is used.
(I assume that few people use that, and those that do probably have it
set to the same as ASSUME_CHARSET normally; so the error wasn't discovered
earlier. Also, the document you tested with has several links to itself
['<A HREF="" ...'], which makes the problem apparent for the initial
page [in the form of wrong charset info on the '=' screen].)
These changes are untested so far; please let us know whether they
make lynx behave as expected.
Klaus