On Fri, 31 Mar 2000, Vlad Harchev wrote:
> On Fri, 31 Mar 2000, Klaus Weide wrote:
> >         /*
> >          * This is more or less copied out of HTLoadFile(), except we don't
> >          * get a content encoding.  This may be overkill.  -dsb
> >          */
> > There is no need to use the logic copied from HTFile.c here.  Its purpose
> > is to determine the 'format'.  But (as long as the source cache as only
> > used as it is now), we already know that it is WWW_HTML.  The SOURCE_CACHE:
> > MEMORY branch just sets this explicitly.  The SOURCE_CACHE:FILE branch
> > doesn't; at best, and normally, it will come to exactly the same conclusion;
> 
>  Which "same conclusion" - the correct one?

Yes, the conclusion that 'format' gets set to WWW_HTML.
> 
> > at worst, it will come tho the wrong conclusion (i.e., some 'format' other
> > than WWW_HTML.
> 
>    Probably you are right (I don't understand this stuff completely - my
> apporach was rather "trial and error") but with my patch lynx behaves much
> better than before (document charset is not lost if SOURCE_CACHE:FILE). Could
> you implement this fix correctly as you propose (if you consider my patch is
> not enough)?
>[...] 
>   Yes, seems the "else" branch is never being executed. Probably it should be
> removed. But I think we should remove it in next development cycle only.

Ok, agreed.

> > The more fundamental problem is that the source case _relies_ on the
> > HTMainAnchor->charset to be still present (and valid) when HTparse_
> > document gets called, instead of keeping track of it on its own.
> > (A 'real' cache also has to cache meta-information.)  That's not
> > always reliable - the HTMainAnchor->charset may disappear in various
> > situations.
> 
>   In which situation HTMainAnchor->charset can be lost? How usual such
> situations?

See below.

>   But I agree that it's better to save charset info somehow.
> 
> > As a demonstration of the last point, try the following (I haven't
> > done exactly this):  Choose a HTTP HTML document ('A') that needs
> > translation, and which has the charset in a META tag (_not_ in the
> > HTTP header).
> > - Switch to source mode and back several times - everything should
> >   work.  (End in normal mode).
> > - Press 'V' (or anything that will give you a link to A).
> > - Press 'd' on the link to A.  let the download proceed so that you
> >   get to the Download Options page.
> > - Don't do anything on the D.O. page - or do, if you like -, just
> >   go back with Left Arrow (twice) to A.
> > - NOW toggle '\' again.   What do you see?
> 
>   I tried with patched lynx - info page says that charset is the same, but it
> adds that it's assumed (but display is not corrupted - correct translation is 
> used). What should I see?

The '(assumed)' indicates that HTMainAnchor->charset has disappeared.
If the character translation still is done right, that must then be
because it was still saved in the HTAnchor->UCStages...  which happens
to survive the re-loading (it would be cleared when _not_ going to
source mode:
        /*
         * This magic FREE(anchor->UCStages) call
         * stolen from HTuncache_current_document() above.
         */
        if (!(HTOutputFormat && HTOutputFormat == WWW_SOURCE)) {
            FREE(HTMainAnchor->UCStages);
        }
).  So in this case, things work out; it's just a demonstration that
anchor->charset can't always be relied on to remain in HTMainAnchor.

   Klaus


Reply via email to