Re: [Lynx-dev] Update List of Character Entity Names

Brian Inglis Mon, 13 Jan 2025 06:32:28 -0800

On 2025-01-13 02:07, Thomas Dickey wrote:

On Fri, Jan 10, 2025 at 09:56:12PM -0700, Brian Inglis wrote:

On 2025-01-09 14:34, Thomas Dickey wrote:

On Thu, Jan 09, 2025 at 11:15:23AM -0700, Brian Inglis wrote:

Hi folks,


Many sites are now using Character Entity Names defined under

        https://www.w3.org/TR/xml-entity-names/


https://www.w3.org/TR/xml-entity-names/#source


        https://www.w3.org/TR/xml-entity-names/bycodes.html

        https://www.w3.org/TR/xml-entity-names/byalpha.html

the former is about 184KB, and the latter about 386KB, with a lot of HTML 
overhead.
As they have to index character name strings not just codepoint combos, they
probably need about an order of magnitude more space than compose data:
~50KB source with lots of overhead actually ~8KB.


I see.  I'm expecting other issues with zero-width-whatever, but will
(after current work on cdk & dialog) see about making a script to extract
the data from bycodes.html


Hopefully those are no more onerous than zero width &nbsp;?

Thanks for considering it, and your work on these libraries and utilities.

--
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retrancher  but when there is no more to cut
                                -- Antoine de Saint-Exupéry

Re: [Lynx-dev] Update List of Character Entity Names

Reply via email to