Re: [lucy-dev] Some quick benchmarks

Nathan Kurz Thu, 08 Dec 2011 11:05:28 -0800

On Thu, Dec 8, 2011 at 10:02 AM, Nick Wellnhofer <[email protected]> wrote:
> On 08/12/2011 01:41, Marvin Humphrey wrote:
>
> Here is more data from a real world indexing run:
>
> RT+CF: 139 secs
> ST+N:  112 secs
>


Hi Nick --

I'm mostly listening in on this conversation because I haven't thought
much about indexing, but the magnitude of improvement here surprises
me:  I wouldn't have thought that there would be that much time to
shave off!    My presumption was that everything would be dominated by
Disk IO, and that the actual tokenizing time would be tiny.   Are
these numbers both working within memory with a pre-warmed cache so no
disk reads are involved?  Also, have you controlled for whether the
data is sync'ed to disk after the indexing?

I'm not in a position to do it, but it might be insightful to do a
quick profile of where these two are spending their time.  Are we
gaining because the algorithm is faster, or because we have less
function call overhead, or because of something confounding?  Oprofile
on Linux is very easy to use once you have it set up.  In case you
aren't familiar with it, this is a good intro:
http://lbrandy.com/blog/2008/11/oprofile-profiling-in-linux-for-fun-and-profit/.

Thanks!

--nate

Re: [lucy-dev] Some quick benchmarks

Reply via email to