[ https://issues.apache.org/jira/browse/LUCENE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608513#action_12608513 ]
Karl Wettin commented on LUCENE-1317: ------------------------------------- bq. Can use the org.apache.lucene.search.highlight.TokenSources for the TokenStreams. TokenSources only does one document at the time. It is much more efficient to create all documents in a single enumeration of the source reader. I'm thinking something like this: * Load all term vector offsets in a Map</**document number*/ Integer, Map<Term, int[]>>. * Create a Document[] with all doucments from the source reader. * Enumerate all terms and document positions and fill up some sort of token stream factory per field and document. Map</**doc*/Integer, Map</**field*/String, Map</**pos*/ Integer, List<Token>>>>. It would be really nice if Tokens that equals (text, offsets, payload, et c) was reused, but the cost of equality should probably be benchmarked first. * Add all documents to an InstantiatedIndexWriter. > Add InstantiatedIndexWriter.addIndexes(IndexReader[] readers) > ------------------------------------------------------------- > > Key: LUCENE-1317 > URL: https://issues.apache.org/jira/browse/LUCENE-1317 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/* > Reporter: Jason Rutherglen > > Enable InstantiatedIndexWriter to have IndexReaders passed in like > IndexWriter and merged into the index. > Karl mentioned: > bq: It's doable. The simplest solution I can think of is to reconstruct all > the documents in one single enumeration of the source index and then add them > to the writer. I'm however not certain this is the best way nor if > InstantiatedIndexWriter is the place for the code. > How would the documents be reconstructed without creating a lot of overhead? > It seems like InstantiatedIndexWriter is the right place, given it is > presumably more efficient to recreate all the IndexReaders and then commit? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]