Hannes,

A more compact representation would be two tables, one for single-character entities and the other for multi-character entities?

Is that worth considering? I guess that until we have value types, we would still have to box the single-character ones, but a Character should still be smaller than a String, right?

-- Jon

On 6/12/19 1:10 PM, Hannes Wallnöfer wrote:
Please review:

JBS: https://bugs.openjdk.java.net/browse/JDK-8225671
Webrev: http://cr.openjdk.java.net/~hannesw/8225671/webrev.00/

This is the second attempt at supporting HTML 5 entities after JDK-8222318 had 
to be reverted.

Fortunately I didn’t have to keep the HTML 4 entities around after all as I had 
assumed, I just got confused very thoroughly by the test output.

Given the huge increase in number of entities I decided to switch from an enum 
to a plain class with a static Map. Entity values are now stored as strings 
since some entities require dual codepoints. Also, we do not need to use the 
reverse table anymore for lookup of numeric entities, as HTML 5 has a concise 
definition of valid numeric entities [1].

[1]: https://www.w3.org/TR/html52/syntax.html#character-references

I updated the test with entities from all relevant groups (new valid named and 
numeric entities, invalid entities from control characters, surrogates, and 
non-characters). I also tested these manually using the W3 HTML validator [2]. 
Mach4 tier 1 tests also do pass.

[2]: https://validator.w3.org/

Hannes

Reply via email to