On Fri, 31 Mar 2000, Vlad Harchev wrote:
>
> * Fixed problem with charset handling for SOURCE_CACHE:MEMORY - if the charset
> was specified by Content-Type http header it won't be lost (as it was) when
> reparsing from source cache (applies to toggling DTD, switching to srcview,
> and other stuff like *,' etc).
> Notes:
> 2) As for SOURCE_CACHE:MEMORY charset loss problem - I copied the piece of
> code from the SOURCE_CACHE:FILE branch and everything seems to work fine
> now (switching to src view and back in particular).
The piece of code didn't really make sense in the SOURCE_CACHE:FILE branch,
either. The comment says:
/*
* This is more or less copied out of HTLoadFile(), except we don't
* get a content encoding. This may be overkill. -dsb
*/
There is no need to use the logic copied from HTFile.c here. Its purpose
is to determine the 'format'. But (as long as the source cache as only
used as it is now), we already know that it is WWW_HTML. The SOURCE_CACHE:
MEMORY branch just sets this explicitly. The SOURCE_CACHE:FILE branch
doesn't; at best, and normally, it will come to exactly the same conclusion;
at worst, it will come tho the wrong conclusion (i.e., some 'format' other
than WWW_HTML.
The 'problem' is caused by the call to HTCharsetFormat. It is pointless
as far as I can see, since the 'format' passed to it should never have
a ';charset=xxxx' component at that point. But it has the side effect
of removing the explicit charset info from the anchor ('FREE(anchor->
charset);'). That's what you are trying to avoid.
You _do_ manage to avoid HTCharsetFormat(), since in the normal case
HTMainAnchor->content_type will be present. But if the call is never
useful, it should just be removed from HTreparse_document.
The more fundamental problem is that the source case _relies_ on the
HTMainAnchor->charset to be still present (and valid) when HTparse_
document gets called, instead of keeping track of it on its own.
(A 'real' cache also has to cache meta-information.) That's not
always reliable - the HTMainAnchor->charset may disappear in various
situations.
As a demonstration of the last point, try the following (I haven't
done exactly this): Choose a HTTP HTML document ('A') that needs
translation, and which has the charset in a META tag (_not_ in the
HTTP header).
- Switch to source mode and back several times - everything should
work. (End in normal mode).
- Press 'V' (or anything that will give you a link to A).
- Press 'd' on the link to A. let the download proceed so that you
get to the Download Options page.
- Don't do anything on the D.O. page - or do, if you like -, just
go back with Left Arrow (twice) to A.
- NOW toggle '\' again. What do you see?
> diff -ru lynx2-8-3-was/src/GridText.c lynx2-8-3/src/GridText.c
> --- lynx2-8-3-was/src/GridText.c Mon Mar 27 08:14:00 2000
> +++ lynx2-8-3/src/GridText.c Fri Mar 31 12:28:36 2000
> @@ -8365,6 +8365,9 @@
> FREE(HTMainAnchor->UCStages);
> }
>
> + if (HTMainAnchor->content_type) {
> + format = HTAtom_for(HTMainAnchor->content_type);
> + } else {
> /*
> * This is only done to make things aligned with SOURCE_CACHE_NONE and
> * SOURCE_CACHE_FILE when switching to source mode since the original
> @@ -8373,8 +8376,9 @@
> * user-visible benefits, seems just '=' Info Page will show source's
> * effective charset as "(assumed)".
> */
> - format = HTCharsetFormat(format, HTMainAnchor,
> - UCLYhndl_for_unspec);
> + format = HTCharsetFormat(format, HTMainAnchor,
> + UCLYhndl_for_unspec);
> + };
(This removes HTMainAnchor->charset, which is never useful afaics.
But, since HTMainAnchor->charset may disappear unexpectedly anyway,
the source cache logic should not rely on it anyway as the sole source
for charset information.)
> /* not UCLYhndl_HTFile_for_unspec - we are talking about remote documents... */
>
> if (HText_HaveUserChangedForms()) {
Klaus