[jira] Commented: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

Karl Wettin (JIRA) Sun, 18 Mar 2007 07:49:32 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12481952
 ]


Karl Wettin commented on LUCENE-550:
------------------------------------

> Nicolas Lalevée [18/Mar/07 02:04 AM]

> This a very interesting benchmark graph ! Note that there is just a little 
> mistake in there : the labels of the axes are switched. 

The test is sort of crued, a set of queries with variable complexity that for 
each iteration is placed on a new IndexSearcher and IndexReader. Index is 
optimized at all measure points.

> And you said that you still have lot of gain with 250 000 documents because
> retreiving cost. But if I have to made the choice of having everything in 
> memory, 
> I won't put the data of my own model into Lucene. I will keep them in memory
> while not transforming them into stored Lucene >Document. I will just 
> transform 
> them for indexing purpose and just keep an ID in the Lucene store which will 
> help me map the search result to my own model data. This will avoid the 
> transformation Lucene-Document -> MyModel-Data.

I can only agree.

>(after relooking at the UML diagram) : Unless you allow to put POJO objects in 
>a Document ? 

That is the hypothesis. I've actually been a bit baffled by the results I've 
seen the last days while benchmarking. 

The application this was orginially built for (the one with 250 000 documents) 
is fairly busy, on average one query every 10ms 24/7. Peeks at one every 2ms. 
On the single machine setup with 4GB and Solaris the CPU went from 90% busy to 
90% idle when switching from RAMDirectory to InstantiatedIndex. I can at this 
point not say if this is due to bad use of Lucene and compensating for that 
with a crazy solution. But I don't think so. I think I've missed a bunch of 
benchmark factors.

Since that project, and that was some time ago, I have not implemented any 
applications with a "normal" corpus using InstantiatedIndex. 

It is the backbone of the active cache (also availabe in this patch). I'm sure 
people made similar things with MemoryIndex. For each batch of new documents 
inserted, I apply cached queries on the batch-index to detect if the new data 
would affect the results associated with the cached query. (The cache does 
other active things too.)

In the didyoumean issue I use InstantiatedIndex as a speedy a priori index, a 
small index with feature selected text (common user queries known to be 
correct, very common phrases in document titles, et c) that is used to build 
ngrams for token suggestions, build phrase suggestions, rearrange term order in 
phrases, et c. As these documents are very small (a small phrase) it is some 
10x-20x faster than a RAMDirectory at 50 000 documents.



> InstantiatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
>                 Key: LUCENE-550
>                 URL: https://issues.apache.org/jira/browse/LUCENE-550
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0
>            Reporter: Karl Wettin
>         Assigned To: Karl Wettin
>         Attachments: HitCollectionBench.jpg, lucene-550.jpg, 
> test-reports.zip, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, 
> trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, 
> trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2
>
>
> An non file centrinc all in memory index. Consumes some 2x the memory of a 
> RAMDirectory (in a term satured index) but is between 3x-60x faster depending 
> on application and how one counts. Average query is about 8x faster. 
> IndexWriter and IndexModifier have been realized in InterfaceIndexWriter and 
> InterfaceIndexModifier. 
> InstantiatedIndex is wrapped in a new top layer index facade (class Index) 
> that comes with factory methods for writers, readers and searchers for unison 
> index handeling. There are decorators with notification handling that can be 
> used for automatically syncronizing searchers on updates, et.c. 
> Index also comes with FS/RAMDirectory implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

Reply via email to