Hi Eric, > How much faster are Unicode and HTML entity translation now?
The speed up of the table lookup is about factor 3. However, there are still some bottlenecks in the tag / entity substitution code (not the UTF to codepage translation). The code makes heavy use of memmove and realloc, doing the substitutions in-place. I have some ideas how to speed this up further. Fritz Müller also observed that the scrolling and navigation is laggy if the documents contain many links. So this is also an area for improvement. > I would NOT remove the ZIP feature, because a typical help zip > contains more than 100 files, with a median length of just 1 kB > uncompressed. This would waste a lot of disk space with large > clusters and DOS itself also is slow with large directories. The wasted space resulting from many files and large cluster sizes also came into my mind, and it was one of several reasons which made me propose switching to the .PAK container format. But I understand that this file format might be too exotic, and delivering the FreeDOS help as an uncompressed ZIP might be a good compromise. > Maybe the search could avoid reopening the zip many times? I see no reason why this should not be possible, and will put it on my list for one of the next interim releases. > If you want less unpack-overhead, you could even concat > the help texts after the binary in some way and then UPX it. > We have tools for similar tricks in FreeCOM, I believe. As > help is small, everything could fit in RAM after loading. Sadly the help texts are not that small, accumulating to over a megabyte for the FreeDOS help. So one had to resort to XMS or another technique to hold it in RAM. Probably not worth the effort. HTMLHELP itself is also heavy on memory consumption, being the reason I would like to get rid of the included ZIP code and strip everything out not strictly needed :-) But under the light of htmlhelp being a generic viewer I now think to better not do it. > I would not store Unicode translation tables in a central > place. Maybe you could share tables already used by another > app or driver in some way, but this is too exotic to have > a central component a la ICONV or RECODE, I believe. Yes, maybe. There are also different strategies in doing it, leading to different tables. For example HTMLHELP is currently restricted to doing character to character conversions, and it can not handle character to multi-character conversions (useful when translating emojis etc.). > You already wanted to check different strategies for tables, > so maybe you can come up with an encoding which is both good > for speed and for memory footprint? I think you are referring to my thoughts if it is possible to optimize the translation string table memory consumption for doing language translations, especially getting rid of the original language strings compiled into the binary. I have some possible solutions in my mind, but DOS purists are gonna hate me for the most technically exciting of them (to me) :-) It basically boiles down to abusing the NE file format and existing toolchain ecosystem. Translation tables go into string resources, the default MZ stub is replaced by a custom loader doing the relocation etc. But that’s another topic... > I think it is good that HTMLHELP can be used as generic > viewer for simple HTML hypertext packages, either in ZIP > or as individual files, so I also think that it is good > that it can handle HTML entities and Unicode on the fly. Agreed. > > AMB probably goes the other direction: Files have to be > pre-compiled with extra tools, but the viewer is small. Yes, fine piece of software. In my opinion does its job as a FreeDOS help viewer very well. Greetings, Bernd _______________________________________________ Freedos-devel mailing list Freedos-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freedos-devel