I am wondering how TDB deals with UTF strings in general. How are strings stored internally and processed during joins? What I'm most interested in is how the case of UTF normalization is handled? So I think in theory you must store the UTF normalized version of a string so that later, when a join is performed, normalized strings are compared against normalized strings... otherwise TDB must perform normalization on each string at join time which seems would be very expensive. But, if you store normalized strings then you are unable to return the original un-normalized string that was loaded, correct?
Thanks, Tim
