[jira] Commented: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

Karl Wettin (JIRA) Mon, 14 Jan 2008 07:48:57 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558640#action_12558640
 ]


Karl Wettin commented on LUCENE-550:
------------------------------------

I was poking around in the javadocs of this and came to the conclution that 
InstantiatedIndexWriter is depricated code, that it is enough one can construct 
InstantiatedIndex using an optimized IndexReader. This makes all 
InstantiatedIndexes immutable. That makes the no-locks caveat to go away.

Also, it is a hassle to make sure that InstantiatedIndexWriter work just as 
IndexWriter does.

In the future, a segmented Directory-facade could be built on top of this, 
where each InstantiatedIndex is a segment created by IndexWriter flush. It 
would potentially be slower to populate this, but it would be compatible with 
everything. Adding more than one segement will requite merging and optimizing 
indices forth and back in RAMDirectories a but, but InstantiatedIndexes are 
usually quite small.

It feels like much of that code is already there.

On the matter of RAM consumption, using a profiler I recently noticed a 3.2MB 
directory of 3-5;3-3;3-5 ngrams with term vectors consumed something like 35MB 
RAM when loaded to an InstantiatedIndex.




> InstantiatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
>                 Key: LUCENE-550
>                 URL: https://issues.apache.org/jira/browse/LUCENE-550
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0
>            Reporter: Karl Wettin
>            Assignee: Grant Ingersoll
>         Attachments: HitCollectionBench.jpg, 
> LUCENE-550_20071021_no_core_changes.txt, test-reports.zip
>
>
> Represented as a coupled graph of class instances, this all-in-memory index 
> store implementation delivers search results up to a 100 times faster than 
> the file-centric RAMDirectory at the cost of greater RAM consumption.
> Performance seems to be a little bit better than log2n (binary search). No 
> real data on that, just my eyes.
> Populated with a single document InstantiatedIndex is almost, but not quite, 
> as fast as MemoryIndex.    
> At 20,000 document 10-50 characters long InstantiatedIndex outperforms 
> RAMDirectory some 30x,
> 15x at 100 documents of 2000 charachters length,
> and is linear to RAMDirectory at 10,000 documents of 2000 characters length.
> Mileage may vary depending on term saturation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

Reply via email to