[ 
https://issues.apache.org/jira/browse/LUCENE-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch resolved LUCENE-211.
----------------------------------

    Resolution: Duplicate
      Assignee:     (was: Lucene Developers)

This is a very similar idea to LUCENE-843, which is already committed.

> [Patch] replace DocumentWriter with InvertedDocument for performance
> --------------------------------------------------------------------
>
>                 Key: LUCENE-211
>                 URL: https://issues.apache.org/jira/browse/LUCENE-211
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: unspecified
>         Environment: Operating System: All
> Platform: All
>            Reporter: Brian Slesinsky
>            Priority: Minor
>         Attachments: inverted-doc.patch
>
>
> I've found a way to improve Lucene's indexing performance by about 45% for my 
> dataset.
> Here's how it works:  currently the indexing process goes like this:
> - use DocumentWriter to create an inverted index and serialize a one-document 
> segment to a 
> RAMDirectory
> - when enough documents have been read, deserialize the one-document segments 
> in the 
> RAMDirectory and merge them, writing the merged segment to disk.
> What I've done instead is create a new class, InvertedDocument, that keeps 
> the inverted index in a Map, 
> and can also be used directly as input for a merge.  This avoids the 
> serialization/deserialization step, 
> and the RAMDirectory is no longer used when indexing.
> The patch applies to the contents of CVS as of today (April 3).  (It's a big 
> patch and includes some 
> minor style tweaks that aren't directly related.)
> I did the performance testing using a simple application that creates an 
> index from a file containing 
> messages extracted from a bulletin board.  It could index about 100 
> kilobytes/second with Lucene 1.3, 
> and 145 kilobytes/second with the patch.  This is on an 700Mhz eMac, which is 
> pretty slow at Java, and 
> the documents being indexed are, on average, less than a screenful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to