Hi,
 
 At work, our application does a lot of DOM parsing of small xml blobs with 
similarly small DTDs. We do not have grammar caching enabled (this is being 
worked on).
 
 Profiling our app in normal running showed that we were spending 7% of our cpu 
time in log(), called from 
xerces/src/xercesc/validators/common/DFAContentModel.cpp:1050.
 
 Printing out the values passed to log(), it was 1 in 99% of cases (2 in the 
other 1%).
 
 Looking at the line in question:
 
 if(fNumItems <= setT->getBitCountInRange(fLeafIndexes[1], 
fLeafIndexes[fNumItems])*log((float)fNumItems))
 
 I note that:
 
 1) fNumItems is an unsigned int which is > 0
 2) log(1) == 0.0
 3) The expression:
 setT->getBitCountInRange(fLeafIndexes[1], 
fLeafIndexes[fNumItems])*log((float)fNumItems)
    is thus always 0 when fNumItems is 1
 4) 1 > 0, so this branch is never taken when fNumItems is 1.
 
 Changing the line to:
 
 if(fNumItems > 1 && fNumItems <= setT->getBitCountInRange(fLeafIndexes[1], 
fLeafIndexes[fNumItems])*log((float)fNumItems))
     
 Avoids both the log() call and getBitCountInRange() when this is the case,  
and 
resulted in ~7% speedup for us.
 
 If this isn't contentious, it would be great to see this change in mainline.
 
 Cheers,
 Jon


Reply via email to