Am 24.04.2010 01:09, schrieb Xueming Shen:
Yes, the final table takes about 500k, we might consider to use a weakref or something, if memory really a concern. But the table will get initialized only if you invoke Character.getName(),
Sherman, how did you compute that value: - A Map.Entry object counts 24 bytes (40 on 64-bit machine) - An Integer object for the key counts 12 bytes (20 on 64-bit machine)- A String object counts 36 + 2*length, so for average character name length of 24:
84 bytes (98 on 64-bit machine)--> one character name in HashMap would count including buckets overhead ~135 bytes (~170 on 64-bit machine)
--> 20.000 character names would count ~2.7 MByte (~3.4 on 64-bit machine) See my new version in attachment. I estimate: - for byte[] names: 480.000 bytes - for int[][] indexes: -- base array size with 4353 elements: 17.420 bytes -- one int[] index for block with average length of 32: 140 bytes -- sum: 626.700 bytes over all sum: 1.106.700 bytes (pretty enough)If the block offset would be smaller than 256, I guess it would be more less. (with the impact of little decreased performance)
- Initializing the indexes array should be *much* faster than filling the hash map. - Retrieving an index should be little faster or equivalent, but instantiation of one new String object must be added.
We could go further: - separate caches (and data files) for the 17 Unicode planes- calculate short 1/2-byte keys for textual words and frequent phrases. I estimate, there are 1000..4000 different words and 100..300 redundant phrases in the data.
Are you interested in that ?We could add (Attention: CCC change) a cacheCharacterNames(boolean yesNo) method to serve users, which excessively need this functionality.
-Ulf
CharacterName2.java
Description: java/