Hi Eric,

> How much faster are Unicode and HTML entity translation now?

The speed up of the table lookup is about factor 3. However, there are still 
some bottlenecks in the tag / entity substitution code (not the UTF to codepage 
translation). The code makes heavy use of memmove and realloc, doing the 
substitutions in-place. I have some ideas how to speed this up further.

Fritz Müller also observed that the scrolling and navigation is laggy if the 
documents contain many links. So this is also an area for improvement.

> I would NOT remove the ZIP feature, because a typical help zip
> contains more than 100 files, with a median length of just 1 kB
> uncompressed. This would waste a lot of disk space with large
> clusters and DOS itself also is slow with large directories.

The wasted space resulting from many files and large cluster sizes also came 
into my mind,  and it was one of several reasons which made me propose 
switching to the .PAK container format. But I understand that this file format 
might be too exotic, and delivering the FreeDOS help as an uncompressed ZIP 
might be a good compromise.

> Maybe the search could avoid reopening the zip many times?

I see no reason why this should not be possible, and will put it on my list for 
one of the next interim releases.

> If you want less unpack-overhead, you could even concat
> the help texts after the binary in some way and then UPX it.
> We have tools for similar tricks in FreeCOM, I believe. As
> help is small, everything could fit in RAM after loading.

Sadly the help texts are not that small, accumulating to over a megabyte for 
the FreeDOS help. So one had to resort to XMS or another technique to hold it 
in RAM. Probably not worth the effort. HTMLHELP itself is also heavy on memory 
consumption, being the reason I would like to get rid of the included ZIP code 
and strip everything out not strictly needed :-) But under the light of 
htmlhelp being a generic viewer I now think to better not do it.

> I would not store Unicode translation tables in a central
> place. Maybe you could share tables already used by another
> app or driver in some way, but this is too exotic to have
> a central component a la ICONV or RECODE, I believe.

Yes, maybe. There are also different strategies in doing it, leading to 
different tables. For example HTMLHELP is currently restricted to doing 
character to character conversions, and it can not handle character to 
multi-character conversions (useful when translating emojis etc.).

> You already wanted to check different strategies for tables,
> so maybe you can come up with an encoding which is both good
> for speed and for memory footprint?

I think you are referring to my thoughts if it is possible to optimize the 
translation string table memory consumption for doing language translations, 
especially getting rid of the original language strings compiled into the 
binary. I have some possible solutions in my mind, but DOS purists are gonna 
hate me for the most technically exciting of them (to me) :-) It basically 
boiles down to abusing the NE file format and existing toolchain ecosystem. 
Translation tables go into string resources, the default MZ stub is replaced by 
a custom loader doing the relocation etc. But that’s another topic...

> I think it is good that HTMLHELP can be used as generic
> viewer for simple HTML hypertext packages, either in ZIP
> or as individual files, so I also think that it is good
> that it can handle HTML entities and Unicode on the fly.

Agreed.

> 
> AMB probably goes the other direction: Files have to be
> pre-compiled with extra tools, but the viewer is small.

Yes, fine piece of software. In my opinion does its job as a FreeDOS help 
viewer very well.

Greetings, Bernd



_______________________________________________
Freedos-devel mailing list
Freedos-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freedos-devel

Reply via email to