I'm always skeptical of storing the doc IDs since they can change out from underneath you (just delete even a single document and optimize). What is it you're doing with the doc ID that you couldn't do with the guid? If your "guid list" were ordered, I can imagine building filters quite quickly from it using TermDocs.skipTo for instance..
Or is this entirely unreasonable??? Best Erick On Mon, Mar 28, 2011 at 8:31 PM, Trejkaz <trej...@trypticon.org> wrote: > Hi all. > > I'm trying to parallelise writing documents into an index. Let's set > aside the fact that 3.1 is much better at this than 3.0.x... but I'm > using 3.0.3. > > One of the things I need to know is the doc ID of each document added > so that we can add them into auxiliary database tables which are keyed > by it. If multiple threads are using the same writer, I can still do > this as follows: > > IndexWriter writer; > boolean parallel; > > // ... > > private int addDocument(String guid, ...) { > Document doc = new Document(); > doc.add(new Field("guid", guid, Store.YES, Index.ANALYZED)); > // eliding other fields > writer.addDocument(doc); > > if (parallel) { > IndexReader realTimeReader = writer.getReader(); > try { > TermDocs termDocs = realTimeReader.termDocs(); > try { > termDocs.seek(new Term("guid", guid)); > if (termDocs.next()) { > return termDocs.doc(); > } else { > throw new IllegalStateException(String.format( > "We added item with GUID %s but it wasn't > found immediately afterwards", guid)); > } > } finally { > termDocs.close(); > } > } finally { > realTimeReader.close(); > } > } else { > return writer.maxDoc(); > } > } > > Benchmarking this for a single thread, there is a difference in cost > between doing it using a search and doing it by calling maxDoc(), as > you might expect: > > Time for parallel-safe version: 147.561s > Time for unsafe version: 62.603s > > Is there a way to achieve this result with less overhead? > > (Note: for reasons of performance, we cannot use a field to store an > ID to use for database tables, as this is several orders of magnitude > slower when you need to build a filter based on a database query.) > > TX > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org