> > I agree that having 32 UT_UCSChar would vaste lot of memory, and I > > would like to see a case made first why we need to support 32-bit > > Unicode. > > I wonder if it's worth making this a compile-time option?
On further reflection, the increase in memory consuption would not be at all critical. If we assume average 10 characters per word, then in a 2,000-word essay we are looking at 40kB extra in the standard build and 80kB in the bidi build (due to an internal cache) - - that is negligeable. In a 100,000-word document (a book of ~250 pages) we are looking at 2MB extra in the non-bidi build, and 4MB in the bidi build -- that is entirely acceptable considering today's memory sizes and prices. There would be some memory increase unrelated to the actual size of the text in the piece table, but that should be negligeable. So we might want to consider making UT_UCSChar 32 bit by default, and add a compile-time option for those who would want 16- bit version only. This should not amount to much more than changing the typedef for UT_UCSChar and cleaning up any poorely designed code that has got the size of UT_UCSChar hardwired, but from what I recall, there should be hardly any. Tomas
