[
https://issues.apache.org/jira/browse/LUCENE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608513#action_12608513
]
Karl Wettin commented on LUCENE-1317:
-------------------------------------
bq. Can use the org.apache.lucene.search.highlight.TokenSources for the
TokenStreams.
TokenSources only does one document at the time. It is much more efficient to
create all documents in a single enumeration of the source reader.
I'm thinking something like this:
* Load all term vector offsets in a Map</**document number*/ Integer,
Map<Term, int[]>>.
* Create a Document[] with all doucments from the source reader.
* Enumerate all terms and document positions and fill up some sort of token
stream factory per field and document. Map</**doc*/Integer,
Map</**field*/String, Map</**pos*/ Integer, List<Token>>>>. It would be really
nice if Tokens that equals (text, offsets, payload, et c) was reused, but the
cost of equality should probably be benchmarked first.
* Add all documents to an InstantiatedIndexWriter.
> Add InstantiatedIndexWriter.addIndexes(IndexReader[] readers)
> -------------------------------------------------------------
>
> Key: LUCENE-1317
> URL: https://issues.apache.org/jira/browse/LUCENE-1317
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/*
> Reporter: Jason Rutherglen
>
> Enable InstantiatedIndexWriter to have IndexReaders passed in like
> IndexWriter and merged into the index.
> Karl mentioned:
> bq: It's doable. The simplest solution I can think of is to reconstruct all
> the documents in one single enumeration of the source index and then add them
> to the writer. I'm however not certain this is the best way nor if
> InstantiatedIndexWriter is the place for the code.
> How would the documents be reconstructed without creating a lot of overhead?
> It seems like InstantiatedIndexWriter is the right place, given it is
> presumably more efficient to recreate all the IndexReaders and then commit?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]