How is UTF-8 handled in TDB

Tim Harsch Wed, 22 Feb 2012 14:01:40 -0800

I am wondering how TDB deals with UTF strings in general.  How are strings 
stored internally and processed during joins?  What I'm most interested in is 
how the case of UTF normalization is handled?  So I think in theory you must 
store the UTF normalized version of a string so that later, when a join is 
performed, normalized strings are compared against normalized strings...  
otherwise TDB must perform normalization on each string at join time which 
seems would be very expensive.  But, if you store normalized strings then you 
are unable to return the original un-normalized string that was loaded, correct?


Thanks,
Tim

How is UTF-8 handled in TDB

Reply via email to