Re: [lucy-dev] Some quick benchmarks

Joe Schaefer Thu, 08 Dec 2011 14:38:48 -0800

When is all this nifty code going to land in trunk?  Don't
wait for anyone to give you permission Nick, that decision
is all yours.




----- Original Message -----
> From: Nick Wellnhofer <[email protected]>
> To: [email protected]
> Cc: 
> Sent: Thursday, December 8, 2011 2:43 PM
> Subject: Re: [lucy-dev] Some quick benchmarks
> 
> On 08/12/11 20:04, Nathan Kurz wrote:
>>  I'm mostly listening in on this conversation because I haven't 
> thought
>>  much about indexing, but the magnitude of improvement here surprises
>>  me:  I wouldn't have thought that there would be that much time to
>>  shave off!    My presumption was that everything would be dominated by
>>  Disk IO, and that the actual tokenizing time would be tiny.   Are
>>  these numbers both working within memory with a pre-warmed cache so no
>>  disk reads are involved?  Also, have you controlled for whether the
>>  data is sync'ed to disk after the indexing?
> 
> These numbers are with pre-warmed cache. Also, the data isn't synced AFAIU. 
> But I think the analysis chain is CPU bound in the general case. All that 
> tokenizing, normalizing and stemming uses a lot of CPU cycles.
> 
>>  I'm not in a position to do it, but it might be insightful to do a
>>  quick profile of where these two are spending their time.  Are we
>>  gaining because the algorithm is faster, or because we have less
>>  function call overhead, or because of something confounding?
> 
> It's mainly that the algorithms are faster. The CaseFolder seems to be 
> especially slow but I have no idea why.
> 
>>  Oprofile
>>  on Linux is very easy to use once you have it set up.  In case you
>>  aren't familiar with it, this is a good intro:
>> 
> http://lbrandy.com/blog/2008/11/oprofile-profiling-in-linux-for-fun-and-profit/.
> 
> I have used it once and found it hard to setup on a virtual machine. But 
> it's very useful if you want to profile long running processes.
> 
> Nick
>

Re: [lucy-dev] Some quick benchmarks

Reply via email to