On Fri, 31 Mar 2000, Klaus Weide wrote:
> On Fri, 31 Mar 2000, Vlad Harchev wrote:
>
> >
> > * Fixed problem with charset handling for SOURCE_CACHE:MEMORY - if the charset
> > was specified by Content-Type http header it won't be lost (as it was) when
> > reparsing from source cache (applies to toggling DTD, switching to srcview,
> > and other stuff like *,' etc).
>
> > Notes:
> > 2) As for SOURCE_CACHE:MEMORY charset loss problem - I copied the piece of
> > code from the SOURCE_CACHE:FILE branch and everything seems to work fine
> > now (switching to src view and back in particular).
>
> The piece of code didn't really make sense in the SOURCE_CACHE:FILE branch,
> either. The comment says:
> /*
> * This is more or less copied out of HTLoadFile(), except we don't
> * get a content encoding. This may be overkill. -dsb
> */
> There is no need to use the logic copied from HTFile.c here. Its purpose
> is to determine the 'format'. But (as long as the source cache as only
> used as it is now), we already know that it is WWW_HTML. The SOURCE_CACHE:
> MEMORY branch just sets this explicitly. The SOURCE_CACHE:FILE branch
> doesn't; at best, and normally, it will come to exactly the same conclusion;
Which "same conclusion" - the correct one?
> at worst, it will come tho the wrong conclusion (i.e., some 'format' other
> than WWW_HTML.
Probably you are right (I don't understand this stuff completely - my
apporach was rather "trial and error") but with my patch lynx behaves much
better than before (document charset is not lost if SOURCE_CACHE:FILE). Could
you implement this fix correctly as you propose (if you consider my patch is
not enough)?
> The 'problem' is caused by the call to HTCharsetFormat. It is pointless
> as far as I can see, since the 'format' passed to it should never have
> a ';charset=xxxx' component at that point. But it has the side effect
> of removing the explicit charset info from the anchor ('FREE(anchor->
> charset);'). That's what you are trying to avoid.
>
> You _do_ manage to avoid HTCharsetFormat(), since in the normal case
> HTMainAnchor->content_type will be present. But if the call is never
> useful, it should just be removed from HTreparse_document.
Yes, seems the "else" branch is never being executed. Probably it should be
removed. But I think we should remove it in next development cycle only.
> The more fundamental problem is that the source case _relies_ on the
> HTMainAnchor->charset to be still present (and valid) when HTparse_
> document gets called, instead of keeping track of it on its own.
> (A 'real' cache also has to cache meta-information.) That's not
> always reliable - the HTMainAnchor->charset may disappear in various
> situations.
In which situation HTMainAnchor->charset can be lost? How usual such
situations?
But I agree that it's better to save charset info somehow.
> As a demonstration of the last point, try the following (I haven't
> done exactly this): Choose a HTTP HTML document ('A') that needs
> translation, and which has the charset in a META tag (_not_ in the
> HTTP header).
> - Switch to source mode and back several times - everything should
> work. (End in normal mode).
> - Press 'V' (or anything that will give you a link to A).
> - Press 'd' on the link to A. let the download proceed so that you
> get to the Download Options page.
> - Don't do anything on the D.O. page - or do, if you like -, just
> go back with Left Arrow (twice) to A.
> - NOW toggle '\' again. What do you see?
I tried with patched lynx - info page says that charset is the same, but it
adds that it's assumed (but display is not corrupted - correct translation is
used). What should I see?
>[...]
Best regards,
-Vlad