On Fri, 31 Mar 2000, Vlad Harchev wrote:
> On Fri, 31 Mar 2000, Klaus Weide wrote:
> > /*
> > * This is more or less copied out of HTLoadFile(), except we don't
> > * get a content encoding. This may be overkill. -dsb
> > */
> > There is no need to use the logic copied from HTFile.c here. Its purpose
> > is to determine the 'format'. But (as long as the source cache as only
> > used as it is now), we already know that it is WWW_HTML. The SOURCE_CACHE:
> > MEMORY branch just sets this explicitly. The SOURCE_CACHE:FILE branch
> > doesn't; at best, and normally, it will come to exactly the same conclusion;
>
> Which "same conclusion" - the correct one?
Yes, the conclusion that 'format' gets set to WWW_HTML.
>
> > at worst, it will come tho the wrong conclusion (i.e., some 'format' other
> > than WWW_HTML.
>
> Probably you are right (I don't understand this stuff completely - my
> apporach was rather "trial and error") but with my patch lynx behaves much
> better than before (document charset is not lost if SOURCE_CACHE:FILE). Could
> you implement this fix correctly as you propose (if you consider my patch is
> not enough)?
>[...]
> Yes, seems the "else" branch is never being executed. Probably it should be
> removed. But I think we should remove it in next development cycle only.
Ok, agreed.
> > The more fundamental problem is that the source case _relies_ on the
> > HTMainAnchor->charset to be still present (and valid) when HTparse_
> > document gets called, instead of keeping track of it on its own.
> > (A 'real' cache also has to cache meta-information.) That's not
> > always reliable - the HTMainAnchor->charset may disappear in various
> > situations.
>
> In which situation HTMainAnchor->charset can be lost? How usual such
> situations?
See below.
> But I agree that it's better to save charset info somehow.
>
> > As a demonstration of the last point, try the following (I haven't
> > done exactly this): Choose a HTTP HTML document ('A') that needs
> > translation, and which has the charset in a META tag (_not_ in the
> > HTTP header).
> > - Switch to source mode and back several times - everything should
> > work. (End in normal mode).
> > - Press 'V' (or anything that will give you a link to A).
> > - Press 'd' on the link to A. let the download proceed so that you
> > get to the Download Options page.
> > - Don't do anything on the D.O. page - or do, if you like -, just
> > go back with Left Arrow (twice) to A.
> > - NOW toggle '\' again. What do you see?
>
> I tried with patched lynx - info page says that charset is the same, but it
> adds that it's assumed (but display is not corrupted - correct translation is
> used). What should I see?
The '(assumed)' indicates that HTMainAnchor->charset has disappeared.
If the character translation still is done right, that must then be
because it was still saved in the HTAnchor->UCStages... which happens
to survive the re-loading (it would be cleared when _not_ going to
source mode:
/*
* This magic FREE(anchor->UCStages) call
* stolen from HTuncache_current_document() above.
*/
if (!(HTOutputFormat && HTOutputFormat == WWW_SOURCE)) {
FREE(HTMainAnchor->UCStages);
}
). So in this case, things work out; it's just a demonstration that
anchor->charset can't always be relied on to remain in HTMainAnchor.
Klaus