I have corrected the statistics:

current code from Sherman:
- A Map.Entry object counts 24 bytes (40 on 64-bit machine)
- An Integer object for the key counts 12 bytes (20 on 64-bit machine)
- A String object counts 36 + 2*length, so for average character name length of 26:
      88 bytes (102 on 64-bit machine)
--> one character name in HashMap would count including buckets overhead ~140 bytes (~170 on 64-bit machine)
--> 20.000 character names would count ~2.8 MByte (~3.4 on 64-bit machine)


Measures, resulting from my latest version:
- for byte[] names: 509.554 bytes
- for int[][] indexes:
-- base array size with 4353 elements: 17.420 bytes
-- one int[] index for block with average length of 220: 892 bytes
-- sum: 17.420 + 97* 892 bytes = 103.944 bytes
over all sum: 613.498 bytes (pretty enough)


I did some statistics:

total name bytes: 509554
total names count: 19336
average name length: 26,35
used blocks: 97
average block length: 220,89
total words count: 7352
total words chars: 37427
    word  LETTER               occurs 5777 times
    word  WITH                 occurs 2271 times
    word  SMALL                occurs 2088 times
    word  SYLLABLE             occurs 1991 times
    word  SIGN                 occurs 1853 times
    word  CAPITAL              occurs 1578 times
    word  LATIN                occurs 1261 times
    word  YI                   occurs 1236 times
    word  CJK                  occurs 1160 times
    word  IDEOGRAPH            occurs 1094 times
    word  ARABIC               occurs 1047 times
    word  COMPATIBILITY        occurs 1009 times
    word  MATHEMATICAL         occurs 1007 times
    word  CUNEIFORM            occurs 982 times
    word  SYMBOL               occurs 962 times
    word  FORM                 occurs 798 times
    word  SYLLABICS            occurs 630 times
    word  CANADIAN             occurs 630 times
    word  BOLD                 occurs 567 times
    word  GREEK                occurs 517 times
    word  AND                  occurs 508 times
    word  LIGATURE             occurs 508 times
    word  DIGIT                occurs 506 times
    word  MUSICAL              occurs 498 times
    word  TIMES                occurs 492 times
    word  ETHIOPIC             occurs 461 times
    word  HANGUL               occurs 446 times
    word  ITALIC               occurs 423 times
    word  CYRILLIC             occurs 403 times
    word  RADICAL              occurs 385 times
    word  ABOVE                occurs 379 times
    word  SANS                 occurs 368 times
    word  -SERIF               occurs 368 times
    word  VOWEL                occurs 357 times
    word  ARROW                occurs 338 times
    word  DOTS                 occurs 328 times
    word  RIGHT                occurs 326 times
    word  FOR                  occurs 321 times
    word  LEFT                 occurs 316 times
    word  CIRCLED              occurs 312 times
    word  DOUBLE               occurs 308 times
    word  SQUARE               occurs 308 times
    word  VAI                  occurs 300 times
    word  FINAL                occurs 295 times
    word  COMBINING            occurs 293 times
    word  A                    occurs 284 times
    word  B                    occurs 277 times
    word  U                    occurs 269 times
    word  VARIATION            occurs 260 times
    word  SELECTOR             occurs 259 times
    word  PATTERN              occurs 257 times
    word  BRAILLE              occurs 256 times
    word  BYZANTINE            occurs 246 times
    word  O                    occurs 236 times
    word  ISOLATED             occurs 236 times
    word  VERTICAL             occurs 228 times
    word  BELOW                occurs 227 times
    word  DOT                  occurs 227 times
    word  KATAKANA             occurs 222 times
    word  MARK                 occurs 218 times
    word  E                    occurs 216 times
    word  KANGXI               occurs 214 times
    word  LINEAR               occurs 211 times
    word  MODIFIER             occurs 207 times
    word  TIBETAN              occurs 201 times
    word  TWO                  occurs 200 times
    word  I                    occurs 199 times
    word  STROKE               occurs 196 times
    word  MEEM                 occurs 192 times
    word  INITIAL              occurs 177 times
    word  WHITE                occurs 177 times
    word  CARRIER              occurs 175 times
    word  YEH                  occurs 174 times
    word  TO                   occurs 173 times
    word  BLACK                occurs 173 times
    word  ONE                  occurs 165 times
    word  NUMBER               occurs 160 times
    word  MONGOLIAN            occurs 156 times
    word  MYANMAR              occurs 156 times
    word  THREE                occurs 154 times
    word  HOOK                 occurs 152 times
    word  COPTIC               occurs 150 times
    word  KHMER                occurs 146 times
    word  TILE                 occurs 145 times
    word  BOX                  occurs 143 times
    word  PLUS                 occurs 142 times
    word  HORIZONTAL           occurs 137 times
    word  BRACKET              occurs 135 times
    word  HEBREW               occurs 133 times
    word  RIGHTWARDS           occurs 131 times
    word  OF                   occurs 128 times
    word  UP                   occurs 128 times
    word  DRAWINGS             occurs 128 times
    word  KA                   occurs 127 times
    word  ALEF                 occurs 126 times
    word  DOWN                 occurs 125 times
    word  OLD                  occurs 124 times
    word  HALFWIDTH            occurs 122 times
    word  FOUR                 occurs 121 times
    word  GEORGIAN             occurs 121 times
    word  BAR                  occurs 121 times
    word  BALINESE             occurs 121 times
    word  -THAN                occurs 120 times
    word  -CREE                occurs 119 times
    word  L                    occurs 117 times
    word  R                    occurs 117 times
    word  IDEOGRAM             occurs 117 times
    word  HEAVY                occurs 117 times
    word  EQUAL                occurs 115 times
    word  TAI                  occurs 115 times
    word  IDEOGRAPHIC          occurs 115 times
    word  WEST                 occurs 113 times
    word  PARENTHESIZED        occurs 113 times
    word  N                    occurs 112 times
    word  DEVANAGARI           occurs 112 times
    word  FIVE                 occurs 110 times
    word  SCRIPT               occurs 109 times
    word  TAG                  occurs 105 times
    word  HAH                  occurs 104 times
    word  FULLWIDTH            occurs 103 times
    word  TILDE                occurs 101 times
    word  OVER                 occurs 101 times
    word  LIGHT                occurs 100 times
    word  CHARACTER            occurs 100 times
    word  DOMINO               occurs 100 times
    word  NUMERIC              occurs 99 times
    word  LEFTWARDS            occurs 99 times
    word  FRAKTUR              occurs 99 times
    word  HALF                 occurs 98 times
    word  S                    occurs 97 times
    word  MALAYALAM            occurs 95 times
    word  GLAGOLITIC           occurs 94 times
    word  C                    occurs 93 times
    word  JEEM                 occurs 93 times
    word  TELUGU               occurs 93 times
    word  MEDIAL               occurs 91 times
    word  CHOSEONG             occurs 91 times
    word  ACUTE                occurs 91 times
    word  ARMENIAN             occurs 91 times
    word  BENGALI              occurs 91 times
    word  TONE                 occurs 90 times
    word  OR                   occurs 89 times
    word  HIRAGANA             occurs 89 times
    word  HA                   occurs 87 times
    word  THAI                 occurs 87 times
    word  Z                    occurs 86 times
    word  CIRCLE               occurs 86 times
    word  KANNADA              occurs 86 times
    word  Y                    occurs 85 times
    word  CHEROKEE             occurs 85 times
    word  EIGHT                occurs 84 times
    word  ORIYA                occurs 84 times
    word  GUJARATI             occurs 83 times
    word  CHAM                 occurs 83 times
    word  SIX                  occurs 83 times
    word  DASIA                occurs 83 times
    word  JONGSEONG            occurs 82 times
    word  M                    occurs 81 times
    word  H                    occurs 81 times
    word  T                    occurs 81 times
    word  SAURASHTRA           occurs 81 times
    word  TETRAGRAM            occurs 81 times
    word  RUNIC                occurs 81 times
    word  NEW                  occurs 81 times
    word  DESERET              occurs 80 times
    word  SINHALA              occurs 80 times
    word  LUE                  occurs 80 times
    word  D                    occurs 79 times
    word  G                    occurs 79 times
    word  V                    occurs 79 times
    word  NOTATION             occurs 79 times
    word  SYRIAC               occurs 79 times
    word  CIRCUMFLEX           occurs 79 times
    word  PSILI                occurs 79 times
    word  GURMUKHI             occurs 79 times
    word  SEVEN                occurs 78 times
    word  NINE                 occurs 77 times
    word  VOCALIC              occurs 77 times
    word  LONG                 occurs 74 times
    word  LINE                 occurs 74 times
    word  LEPCHA               occurs 74 times
    word  K                    occurs 73 times
    word  DIAERESIS            occurs 73 times
    word  -STRUCK              occurs 72 times
    word  HAMZA                occurs 72 times
    word  TAMIL                occurs 72 times
    word  APL                  occurs 70 times
    word  FUNCTIONAL           occurs 70 times
    word  TELEGRAPH            occurs 69 times
    word  MAKSURA              occurs 69 times
    word  MACRON               occurs 68 times
    word  ALPHA                occurs 68 times
    word  GRAVE                occurs 68 times
    word  P                    occurs 67 times
    word  OMEGA                occurs 67 times
    word  ACCENT               occurs 67 times
    word  JUNGSEONG            occurs 67 times
    word  LIMBU                occurs 66 times
    word  BARB                 occurs 66 times
    word  TRIANGLE             occurs 66 times
    word  LOW                  occurs 66 times
    word  KHAROSHTHI           occurs 65 times
    word  BOPOMOFO             occurs 65 times
    word  LAO                  occurs 65 times
    word  NOT                  occurs 65 times
    word  RA                   occurs 64 times
    word  YA                   occurs 64 times
    word  HEXAGRAM             occurs 64 times
    word  HARPOON              occurs 64 times
    word  TA                   occurs 63 times
    word  REVERSED             occurs 63 times
    word  X                    occurs 62 times
    word  ANGLE                occurs 62 times
    word  MA                   occurs 62 times
    word  HIGH                 occurs 62 times
    word  MONOSPACE            occurs 62 times
    word  OXIA                 occurs 62 times
    word  VARIA                occurs 62 times
    word  GREATER              occurs 62 times
    word  J                    occurs 61 times
    word  PA                   occurs 61 times
    word  LI                   occurs 61 times
    word  KHAH                 occurs 61 times
    word  LAGAB                occurs 61 times
    word  LESS                 occurs 61 times
    word  W                    occurs 59 times
    word  LA                   occurs 59 times
    word  LOWER                occurs 59 times
    word  NKO                  occurs 59 times
    word  NUMERAL              occurs 58 times
    word  LAM                  occurs 58 times
    word  TURNED               occurs 58 times
    word  F                    occurs 57 times
    word  DA                   occurs 57 times
    word  AEGEAN               occurs 57 times
    word  SHORT                occurs 57 times
    word  GA2                  occurs 56 times
    word  PHAGS                occurs 56 times
    word  OPEN                 occurs 56 times
    word  NA                   occurs 56 times
    word  ETA                  occurs 56 times
    word  -PA                  occurs 56 times
    word  STOP                 occurs 56 times
    word  SUNDANESE            occurs 55 times
    word  CYPRIOT              occurs 55 times
    word  BREVE                occurs 55 times
    word  TIFINAGH             occurs 55 times
    word  IOTA                 occurs 54 times
    word  ACROPHONIC           occurs 53 times
    word  SA                   occurs 53 times
    word  PERSIAN              occurs 53 times
    word  ZERO                 occurs 53 times
    word  UPPER                occurs 53 times
    word  ROMAN                occurs 52 times
    word  SUBJOINED            occurs 52 times
    word  NOON                 occurs 52 times
BUILD SUCCESSFUL (total time: 14 seconds)


-Ulf

Reply via email to