[
https://issues.apache.org/jira/browse/LUCENE-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Busch resolved LUCENE-211.
----------------------------------
Resolution: Duplicate
Assignee: (was: Lucene Developers)
This is a very similar idea to LUCENE-843, which is already committed.
> [Patch] replace DocumentWriter with InvertedDocument for performance
> --------------------------------------------------------------------
>
> Key: LUCENE-211
> URL: https://issues.apache.org/jira/browse/LUCENE-211
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Affects Versions: unspecified
> Environment: Operating System: All
> Platform: All
> Reporter: Brian Slesinsky
> Priority: Minor
> Attachments: inverted-doc.patch
>
>
> I've found a way to improve Lucene's indexing performance by about 45% for my
> dataset.
> Here's how it works: currently the indexing process goes like this:
> - use DocumentWriter to create an inverted index and serialize a one-document
> segment to a
> RAMDirectory
> - when enough documents have been read, deserialize the one-document segments
> in the
> RAMDirectory and merge them, writing the merged segment to disk.
> What I've done instead is create a new class, InvertedDocument, that keeps
> the inverted index in a Map,
> and can also be used directly as input for a merge. This avoids the
> serialization/deserialization step,
> and the RAMDirectory is no longer used when indexing.
> The patch applies to the contents of CVS as of today (April 3). (It's a big
> patch and includes some
> minor style tweaks that aren't directly related.)
> I did the performance testing using a simple application that creates an
> index from a file containing
> messages extracted from a bulletin board. It could index about 100
> kilobytes/second with Lucene 1.3,
> and 145 kilobytes/second with the patch. This is on an 700Mhz eMac, which is
> pretty slow at Java, and
> the documents being indexed are, on average, less than a screenful.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]