Re: Lucene performance: benchmarktemplate.xml

Glen Newton Fri, 18 Apr 2008 08:06:22 -0700

HI Anshum,

A reasonable question. Answer: 64 bit architecture running 64 bit Java
VM. It is great!  :-)


> Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_02-b05, mixed mode)
> OS Version: Linux OpenSUSE 10.2 (64-bit X86-64)

If you have any other questions, please let me know.  :-)

-Glen

On 18/04/2008, Anshum <[EMAIL PROTECTED]> wrote:
> Hi Glenn,
>
>  I am not too clear about it, but isn't there a limit to the memory
>  consumption specified for the JVM? The limit being 1.3Gigs of resident
>  and 2 Gigs of memory in all? You just mentioned the Memory consumption:
>  -Xms4000m -Xmx6000m.
>  Could someone please help me with the same.
>
>  --
>
> Anshum
>
>
>  On Wed, 2008-04-16 at 23:21 +0530, Glen Newton wrote:
>  > On 16/04/2008, Michael McCandless <[EMAIL PROTECTED]> wrote:
>  > > These are great results!  Thanks for posting.
>  >
>  > Thanks!
>  >
>  > >
>  > >  I'd be curious if you'd get better indexing throughput by using a single
>  > > IndexWriter, fed by all 8 indexing threads, with an 8X bigger RAM buffer,
>  > > instead of 8 IndexWriters that merge in the end.
>  >
>  > While I am new to this list, I have been trying different
>  > combinations/configurations for this over the last 18 months. To
>  > answer your questions: no, not on our multi-core machine. But I
>  > haven't  tried your suggested scenario with >= v2.2, so it is possible
>  > that it might be better.
>  >
>  > >  How long does that final merge take now?
>  >
>  > I don't have that timed. I will alter the app to record this.
>  >
>  > >  Also, 64 threads doing document construction seems too high?  You may be
>  > > losing some performance to the cost of thread context switching.
>  >
>  > I did performance tests on 8,16,24,32,48,64,72,96,128 threads. 64 was
>  > the sweet spot, for this particular configuration.
>  >
>  > >  Did you use autoCommit=false?  I think it should help since you have so
>  > > many stored fields and some term vectors.
>  >
>  > Damn, I am using the defaults (autoCommit=true). I will re-run and
>  > post the results!
>  >
>  > Thanks,  :-)
>  >
>  > -glen
>  >
>  > >  Mike
>  > >
>  > >
>  > >  Glen Newton wrote:
>  > >
>  > > > Cass,
>  > > > Thanks for converting it. I've posted it to my blog:
>  > > >
>  > > 
> http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html
>  > > >
>  > > > Sorry for the XML tags: I guess I followed the instructions on the
>  > > > Lucene performance benchmarks page to literally ("Post these figures
>  > > > to the lucene-user mailing list using this template.").
>  > > >
>  > > > Sorry if it hurt your eyes!  :-)
>  > > >
>  > > > -Glen
>  > > >
>  > > > On 15/04/2008, Cass Costello <[EMAIL PROTECTED]> wrote:
>  > > >
>  > > > > I just did that so I could read it. :)  I'll leave it up until Glen
>  > > resends
>  > > > >  or posts it somewhere...
>  > > > >  http://www.casscostello.com/?page_id=28
>  > > > >
>  > > > >
>  > > > >
>  > > > >
>  > > > >  On Tue, Apr 15, 2008 at 5:18 PM, Ian Holsman <[EMAIL PROTECTED]> 
> wrote:
>  > > > >
>  > > > >
>  > > > > > Hi Glen.
>  > > > > > can you resend this in plain text?
>  > > > > > or put the HTML up on a server somewhere and point to it with a 
> brief
>  > > > > > summary in the post?
>  > > > > > I'd love to look and read it, all those tags are making me go 
> blind.
>  > > > > >
>  > > > > >
>  > > > > > Glen Newton wrote:
>  > > > > >
>  > > > > >
>  > > > > > > <benchmark>
>  > > > > > >  <ul>
>  > > > > > >  <p>
>  > > > > > >  <b>Hardware Environment</b><br/>
>  > > > > > >  <li><i>Dedicated machine for indexing</i>: yes</li>
>  > > > > > >  <li><i>CPU</i>: Dual processor dual core Xeon CPU 3.00GHz;
>  > > > > > > hyperthreading ON for 8 virtual cores</li>
>  > > > > > >  <li><i>RAM</i>: 8GB</li>
>  > > > > > >  <li><i>Drive configuration</i>: Dell EMC AX150 storage array 
> fibre
>  > > > > > > channel</li>
>  > > > > > >  </p>
>  > > > > > >  <p>
>  > > > > > >  <b>Software environment</b><br/>
>  > > > > > >  <li><i>Lucene Version</i>: 2.3.1</li>
>  > > > > > >  <li><i>Java Version</i>:  Java(TM) SE Runtime Environment (build
>  > > > > > > 1.6.0_02-b05)</li>
>  > > > > > >  <li><i>Java VM</i>: Java HotSpot(TM) 64-Bit Server VM (build
>  > > > > > > 1.6.0_02-b05, mixed mode)</li>
>  > > > > > >  <li><i>OS Version</i>: Linux OpenSUSE 10.2 (64-bit X86-64)</li>
>  > > > > > >  <li><i>Location of index</i>: Filesystem, on attached 
> storage</li>
>  > > > > > >  </p>
>  > > > > > >  <p>
>  > > > > > >  <b>Lucene indexing variables</b><br/>
>  > > > > > >  <li><i>Number of source documents</i>: 6,404,464</li>
>  > > > > > >  <li><i>Total filesize of source documents</i>: 141GB; Note that
>  > > this
>  > > > > > > is only the full-text: the metadata (title, author(s), abstract,
>  > > > > > > keywords, journal name) are in addition to this</li>
>  > > > > > >  <li><i>Average filesize of source documents</i>:
>  > > > > > > 22KB + metadata (see above)</li>
>  > > > > > >  <li><i>Source documents storage location</i>: Where are the
>  > > documents
>  > > > > > > being indexed located?
>  > > > > > >  Filesystem</li>
>  > > > > > >  <li><i>File type of source documents</i>: text (PDFs converted 
> to
>  > > > > > > text then gzipped)</li>
>  > > > > > >  <li><i>Parser(s) used, if any</i>: None, but files GZIPed & had 
> to
>  > > > > > > be un-gziped by Java application which also did indexing</li>
>  > > > > > >  <li><i>Analyzer(s) used</i>: StandardAnalyzer</li>
>  > > > > > >  <li><i>Number of fields per document</i>: 24</li>
>  > > > > > >  <li><i>Type of fields</i>: all text; 20 stored; 3 of indexed
>  > > > > > > tokenized with term vector (full-text [not stored], title,
>  > > abstract);
>  > > > > > > 10 stored with no parsing; </li>
>  > > > > > >  <li><i>Index persistence</i>: FSDirectory</li>
>  > > > > > >  <li><i>Index size</i>: 83GB</li>
>  > > > > > >  <li><i>Number of terms</i>: 143,298,010</li>
>  > > > > > >  </p>
>  > > > > > >  <p>
>  > > > > > >  <b>Figures</b><br/>
>  > > > > > >  <li><i>Time taken (in ms/s as an average of at least 3 indexing
>  > > > > > > runs)</i>: 20.5 hours</li>
>  > > > > > >  <li><i>Time taken / 1000 docs indexed</i>: 11.5 seconds </li>
>  > > > > > >  <li><i>Memory consumption</i>:  -Xms4000m  -Xmx6000m</li>
>  > > > > > >  <li><i>Query speed</i>: average time a query takes, type
>  > > > > > >  of queries (e.g. simple one-term query, phrase query),
>  > > > > > >  not measuring any overhead outside Lucene</li>
>  > > > > > >  </p>
>  > > > > > >  <p>
>  > > > > > >  <b>Notes</b><br/>
>  > > > > > >  <li><i>Notes</i>:
>  > > > > > >      <ul>
>  > > > > > >        <li>
>  > > > > > >          These are journal articles, so the additional fields
>  > > besides
>  > > > > > > the
>  > > > > > > full-text are bibliographic metadata, such as title, authors,
>  > > > > > > abstract, keywords, journal name, volume, issue, start page, 
> year.
>  > > > > > >        </li>
>  > > > > > >        <li>Java command line directives: -XX:+AggressiveOpts
>  > > > > > > -XX:+ScavengeBeforeFullGC -XX:-UseParallelGC   -server  -Xms4000m
>  > > > > > > -Xmx6000m
>  > > > > > >        </li>
>  > > > > > >        <li>Highly multithreaded & pipelined architecture using
>  > > > > > > java.util.concurrent.ThreadPoolExecutor
>  > > > > > >        </li>
>  > > > > > >        <li>File system file reading and Un-gzip performed
>  > > multithreaded
>  > > > > > >        </li>
>  > > > > > >        <li>Eight separate parallel IndexWriters are fed by the
>  > > pipeline
>  > > > > > > (creation of Document objects occurs in parallel with 64 
> threads),
>  > > > > > > merged at end into single index. Each parallel index had slightly
>  > > > > > > different RAM_BUFFER_SIZE_MB (64, 67, 70, 73, 76, 79, 83, 85 MB
>  > > > > > > respectively), so that flushing wouldn't all happen at the same
>  > > time.
>  > > > > > >        </li>
>  > > > > > >        <li>
>  > > > > > >          Contact: glen DOT newton AT nrc-cnrc DOT gc DOT ca
>  > > > > > >        </li>
>  > > > > > >
>  > > > > > >      </ul>
>  > > > > > > </li>
>  > > > > > >  </p>
>  > > > > > >  </ul>
>  > > > > > > </benchmark>
>  > > > > > >
>  > > > > > >
>  > > > > > >
>  > > > > > >
>  > > > > > >
>  > > > > >
>  > > > > >
>  > > > > >
>  > > ---------------------------------------------------------------------
>  > > > > > To unsubscribe, e-mail:
>  > > [EMAIL PROTECTED]
>  > > > > > For additional commands, e-mail:
>  > > [EMAIL PROTECTED]
>  > > > > >
>  > > > > >
>  > > > > >
>  > > > >
>  > > > >
>  > > > >
>  > > > > --
>  > > > >  Lego timeline:
>  > > > >
>  > > 
> http://cache.gizmodo.com/assets/resources/2008/01/lego-brick4-timeline.jpg
>  > > > >
>  > > > >
>  > > >
>  > > >
>  > > > --
>  > > >
>  > > > -
>  > > >
>  > > >
>  > > ---------------------------------------------------------------------
>  > > > To unsubscribe, e-mail:
>  > > [EMAIL PROTECTED]
>  > > > For additional commands, e-mail:
>  > > [EMAIL PROTECTED]
>  > > >
>  > > >
>  > >
>  > >
>  > > ---------------------------------------------------------------------
>  > >  To unsubscribe, e-mail:
>  > > [EMAIL PROTECTED]
>  > >  For additional commands, e-mail:
>  > > [EMAIL PROTECTED]
>  > >
>  > >
>  >
>  >
>  > --
>  >
>  > -
>  >
>  > ---------------------------------------------------------------------
>  > To unsubscribe, e-mail: [EMAIL PROTECTED]
>  > For additional commands, e-mail: [EMAIL PROTECTED]
>  >
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: [EMAIL PROTECTED]
>  For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-- 

-

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene performance: benchmarktemplate.xml

Reply via email to