On 10/10/2013 05:23, Marvin Humphrey wrote:
I suspect that having TextTermStepper move away from mutating may have minor performance implications. I ran some benchmarks and found some slight degradation from 0.3 to master and from master to cfish-string-prep1 as of fde94c411c7c73ad35171bb19295f781ed48e0dd -- results below. However, I still support merging the branch, just with the note that this may now be a hotspot to look into when refactoring at some point in the future.
Which commit on the new branch did you benchmark exactly? I added back some of the optimizations in c69fb741a5d016455b56de8ca3890c33f55ce464 (S_write_terms_and_postings in PostingPool) shortly after fde94c411c7c73ad35171bb19295f781ed48e0dd.
The part of the indexing code that's still affected by the TextTermStepper changes should be PostPool_Refill. This code loops over a Lexicon and repeatedly reads terms via Lex_Get_Term. With immutable strings, a new String is allocated for each term. I can't see an easy way to speed that up but the performance degradation shouldn't be too bad.
Nick
