[jira] Commented: (LUCENE-1317) Add InstantiatedIndexWriter.addIndexes(IndexReader[] readers)

Karl Wettin (JIRA) Thu, 26 Jun 2008 10:36:08 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608513#action_12608513
 ]


Karl Wettin commented on LUCENE-1317:
-------------------------------------

bq. Can use the org.apache.lucene.search.highlight.TokenSources for the 
TokenStreams.

TokenSources only does one document at the time. It is much more efficient to 
create all documents in a single enumeration of the source reader. 

I'm thinking something like this:
 * Load all term vector offsets in a Map</**document number*/ Integer, 
Map<Term, int[]>>.
 * Create  a Document[]  with all doucments from the source reader.
 * Enumerate all terms and document positions and fill up some sort of token 
stream factory per field and document. Map</**doc*/Integer, 
Map</**field*/String, Map</**pos*/ Integer, List<Token>>>>. It would be really 
nice if Tokens that equals (text, offsets, payload, et c) was reused, but the 
cost of equality should probably be benchmarked first.
 * Add all documents to an InstantiatedIndexWriter.


> Add InstantiatedIndexWriter.addIndexes(IndexReader[] readers)
> -------------------------------------------------------------
>
>                 Key: LUCENE-1317
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1317
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>
> Enable InstantiatedIndexWriter to have IndexReaders passed in like 
> IndexWriter and merged into the index.  
> Karl mentioned:
> bq: It's doable. The simplest solution I can think of is to reconstruct all 
> the documents in one single enumeration of the source index and then add them 
> to the writer. I'm however not certain this is the best way nor if 
> InstantiatedIndexWriter is the place for the code.
> How would the documents be reconstructed without creating a lot of overhead?  
> It seems like InstantiatedIndexWriter is the right place, given it is 
> presumably more efficient to recreate all the IndexReaders and then commit?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1317) Add InstantiatedIndexWriter.addIndexes(IndexReader[] readers)

Reply via email to