Re: lynx-dev bad behavior: changing assumed document charset

Klaus Weide Sat, 4 Mar 2000 20:02:40 -0800
On Sat, 4 Mar 2000, Ivan Zakharyaschev wrote:

> 
> Fisrt, I go to http://sun.ru and then to
> http://sun.ru/developers/index.html. 
> 
> Step 1. I start like this:
> 
> $ lynx --assume-charset windows-1251 sun.ru
> 
> The displayed is page is OK (readable). The O)ptions may be checked now:
> 
> Display charset: Cyrillic (koi8-r)
> Assumed document charset: windows-1251
> Raw 8-bit: OFF
> 
> The info (shown with '=') includes a few interesting lines:
> 
> Charset: koi8-r (assumed)
> Cache Control: no cache
> 
> (It's strange, isn't it?)

Yes, it's strange in two respects:
1. the no-cache - I don't get anything like that (but I was not testing
with exactly your version).
2. The charset, which contradicts what lynx has really assumed (otherwise
the page wouldn't be readable).

I think I found the explanation for 2. - see below.

> Step 2. I press 'x' on a link called "For developers". The associated

(Since you hadn't loaded <http://sun.ru/developers/index.html> before, it
doesn't make a difference whether you use 'x' or Enter.  At least it
shouldn't, and I don't know of anything in the code that would cause a
difference here.)

> document is loaded, and this time the characters are not shown correctly.
> The options and document info (charset and cache-control) remain the same 
> as on the main (first) page.
> 
> Step 3. After R)eloading the document is rendered correctly. The info and
> the options again remain the same.

Thanks for the details.

> KW> Also, you were using 2.8.3dev.14, and there have been significant
> KW> changes in this area since then (esp. wrt source cache).  Could you
> KW> get the latest devel code from <http://lynx.isc.org/current/> and
> KW> check whether the same problem still exists.
> 
> I will do it soon and tell you the results.

Please try the following (preferably with the latest version).
Make two changes in HTML.c:
a. Find the line

    int dest_char_set = UCLYhndl_for_unrec;

and replace it with

    int dest_char_set = -1;

b. Find the line

            if (dest) {

and replace it with

            if (dest && dest_char_set >= 0) {


This (partially) reverts code that is meant to handle the ACCEPT-CHARSET
attribute on A tags to the original state; it seems a logical error was
introduced quite a while ago (since at least 2.8.1), but this error shows
up only when ASSUME_UNREC_CHARSET or -assume_unrec_charset is used.
(I assume that few people use that, and those that do probably have it
set to the same as ASSUME_CHARSET normally; so the error wasn't discovered
earlier.  Also, the document you tested with has several links to itself
['<A HREF="" ...'], which makes the problem apparent for the initial
page [in the form of wrong charset info on the '=' screen].)

These changes are untested so far; please let us know whether they
make lynx behave as expected.

   Klaus
Re: lynx-dev bad behavior: changing assumed document charset

Reply via email to