On Fri, 31 Mar 2000, Klaus Weide wrote:

> On Fri, 31 Mar 2000, Vlad Harchev wrote:
> 
> > 
> > * Fixed problem with charset handling for SOURCE_CACHE:MEMORY - if the charset
> >   was specified by Content-Type http header it won't be lost (as it was) when
> >   reparsing from source cache (applies to toggling DTD, switching to srcview,
> >   and other stuff like *,' etc).
> 
> >  Notes: 
> > 2) As for SOURCE_CACHE:MEMORY charset loss problem - I copied the piece of
> >    code from the SOURCE_CACHE:FILE branch and everything seems to work fine
> >    now (switching to src view and back in particular).
> 
> The piece of code didn't really make sense in the SOURCE_CACHE:FILE branch,
> either.  The comment says:
>         /*
>          * This is more or less copied out of HTLoadFile(), except we don't
>          * get a content encoding.  This may be overkill.  -dsb
>          */
> There is no need to use the logic copied from HTFile.c here.  Its purpose
> is to determine the 'format'.  But (as long as the source cache as only
> used as it is now), we already know that it is WWW_HTML.  The SOURCE_CACHE:
> MEMORY branch just sets this explicitly.  The SOURCE_CACHE:FILE branch
> doesn't; at best, and normally, it will come to exactly the same conclusion;

 Which "same conclusion" - the correct one?

> at worst, it will come tho the wrong conclusion (i.e., some 'format' other
> than WWW_HTML.

   Probably you are right (I don't understand this stuff completely - my
apporach was rather "trial and error") but with my patch lynx behaves much
better than before (document charset is not lost if SOURCE_CACHE:FILE). Could
you implement this fix correctly as you propose (if you consider my patch is
not enough)?

> The 'problem' is caused by the call to HTCharsetFormat.  It is pointless
> as far as I can see, since the 'format' passed to it should never have
> a ';charset=xxxx' component at that point.  But it has the side effect
> of removing the explicit charset info from the anchor ('FREE(anchor->
> charset);').  That's what you are trying to avoid.
> 
> You _do_ manage to avoid HTCharsetFormat(), since in the normal case
> HTMainAnchor->content_type will be present.  But if the call is never
> useful, it should just be removed from HTreparse_document.

  Yes, seems the "else" branch is never being executed. Probably it should be
removed. But I think we should remove it in next development cycle only.

> The more fundamental problem is that the source case _relies_ on the
> HTMainAnchor->charset to be still present (and valid) when HTparse_
> document gets called, instead of keeping track of it on its own.
> (A 'real' cache also has to cache meta-information.)  That's not
> always reliable - the HTMainAnchor->charset may disappear in various
> situations.

  In which situation HTMainAnchor->charset can be lost? How usual such
situations?
  But I agree that it's better to save charset info somehow.

> As a demonstration of the last point, try the following (I haven't
> done exactly this):  Choose a HTTP HTML document ('A') that needs
> translation, and which has the charset in a META tag (_not_ in the
> HTTP header).
> - Switch to source mode and back several times - everything should
>   work.  (End in normal mode).
> - Press 'V' (or anything that will give you a link to A).
> - Press 'd' on the link to A.  let the download proceed so that you
>   get to the Download Options page.
> - Don't do anything on the D.O. page - or do, if you like -, just
>   go back with Left Arrow (twice) to A.
> - NOW toggle '\' again.   What do you see?

  I tried with patched lynx - info page says that charset is the same, but it
adds that it's assumed (but display is not corrupted - correct translation is 
used). What should I see?

>[...]

 Best regards,
  -Vlad

Reply via email to