Well, maybe if I'd read the original post more carefully I'd have figured that out, sorry 'bout that.
I *think* I remember reading somewhere on the email lists that your indexing speed goes up pretty linearly as the number of indexing tasks approaches the number of CPUs. Are you, perhaps, on a dual-core machine? But do search the mail archives because my memory may not be accurate. You can easily combine indexes by IndexWriter.addIndexes BTW. Personally I prefer fewer indexes if you can get away with it. But I'd only try this after Michael's suggestion of using multiple threads on a single underlying writer. You could even think about using N machines to create M fragments then combining them all afterwards if your logs are static enough to make that reasonable. Combining indexes may take a while though..... Best Erick On Wed, Dec 17, 2008 at 10:46 AM, Preetham Kajekar <preet...@cisco.com>wrote: > Hi Erick, > Thanks for the response. Replies inline. > > Erick Erickson wrote: > >> The very first question is always "are you opening a new searcher >> each time you query"? But you've looked at the Wiki so I assume not. >> This question is closely tied to what kind of latency you can tolerate. >> >> A few more details, please. What's slow? Queries? Indexing? >> >> > Indexing. Again, it is not slow. It is just faster with two separate > indexers in two threads. > >> How slow? 100ms? 100s? What are your target times and >> what are you seeing? >> >> > With a single indexer in a single thread, I can index about 20,000 event > objects per second. With 2 thread and 2 indexers, it is close to 50,000. :-) > >> How big is your index? 100M? 100G? What kind of VM >> parameters are you specifying? >> >> > The index will have about 20mil entries. The size of the index lands up > being about 500M. > I start the VM with 1G of heap. No other options for GC etc is used. > >> As an aside, do note that there's no requirement in Lucene that >> each document have the same fields, so it's unclear why you >> need two indexes, but perhaps some of the answers to the above >> will help us understand. >> >> > Like I mentioned, Lucene does the job much faster with two indexes. > >> Also, be very very careful what you measure when you measure >> queries. You absolutely *have* to put some instrumentation in >> the code since "slow queries" can result from things other than >> searching. For instance, iterating over a Hits object for 100s of >> documents.... >> >> > The Query speeds are much faster than what I need :-) So no complains here. > >> Show the code, man <G>! >> >> > Code below. EvIndexer is the base class. There are two subclasses which > implement addEvFieldsToIndexDoc() (template pattern) to add different fields > to the index. that code is also pasted below > > --Code --- > > BaseClass > > public EvIndexer(String indexName) throws Exception { > this.name = indexName; > a = new KeywordAnalyzer(); > INDEX_PATH = System.getProperty(StoreManager.PROP_DB_DB_LOC, > "./index/"); > FSDirectory directory = FSDirectory.getDirectory(INDEX_PATH + > File.separatorChar + indexName, NoLockFactory.getNoLockFactory()); > indexWriter = new IndexWriter(directory, a, > IndexWriter.MaxFieldLength.LIMITED); > //indexWriter.setUseCompoundFile(false); > //indexWriter.setRAMBufferSizeMB(256); > } > /** Method implemented by extending classes to add data into the > index document for the > * given event > * > * @param d > */ > protected abstract void addEvFieldsToIndexDoc(Document d, Ev event); > public void addToIndex(Ev ev) throws Exception { > noOfEventsIndexed++; > Document d = new Document(); addEvFieldsToIndexDoc(d, > ev); > indexWriter.addDocument(d); > if ((noOfEventsIndexed % COMMIT_INTERVAL) == 0) { > System.out.println(name + " indexed " + > NumberFormat.getInstance().format(noOfEventsIndexed) + " Commiting them"); > commit(); > } } > > DerievdClass1 > protected void addEvFieldsToIndexDoc(Document d, Ev ev) { > //noOfEventsIndexed++; > Field id = new Field(EV_ID, Long.toString(ev.getId()), > Field.Store.YES, Field.Index.NO); > Field src = new Field(EV_SRC, Long.toString(ev.getSrcId()), > Field.Store.NO, Field.Index.NOT_ANALYZED); > Field type = new Field(EV_TYPE, > Integer.toString(ev.getEventTypeId()), Field.Store.NO, > Field.Index.NOT_ANALYZED); > Field pri = new Field(EV_PRI, Short.toString(ev.getPriority()) , > Field.Store.NO, Field.Index.NOT_ANALYZED); > Field time = new Field(EV_TIME, getHexString(ev.getRecvTime()) , > Field.Store.NO, Field.Index.NOT_ANALYZED); > d.add(id); > d.add(src); > d.add(type); > d.add(pri); > d.add(time); > //noOfFieldsIndexed += 4; > } > > > > > Thanks for the support. > ~preetham > > > Best >> Erick >> >> >> On Wed, Dec 17, 2008 at 9:40 AM, Preetham Kajekar <preet...@cisco.com >> >wrote: >> >> >> >>> Hi Grant, >>> Thanks four response. Replies inline. >>> >>> Grant Ingersoll wrote: >>> >>> >>> >>>> On Dec 17, 2008, at 12:57 AM, Preetham Kajekar wrote: >>>> >>>> Hi, >>>> >>>> >>>>> I am new to Lucene. I am not using it as a pure text indexer. >>>>> >>>>> I am trying to index a Java object which has about 10 fields (like id, >>>>> time, srcIp, dstIp) - most of them being numerical values. >>>>> In order to speed up indexing, I figured that having two separate >>>>> indexers, each of them indexing different set of fields works great. So >>>>> I >>>>> have the first 5 fields in index1 and the remaining in index2. >>>>> >>>>> >>>>> >>>> Can you explain this a bit more? Are those two fields really large org >>>> something? How are you obtaining them? How are you correlating the >>>> documents between the two indexes? Did you actually try a single index >>>> and >>>> it was too slow? >>>> >>>> >>>> >>> I have a java object which has about 10 fields. However, the fields are >>> not >>> fixed. The java object is essentially a representation of Syslogs from >>> network devices. So different syslogs have different fields. Each field >>> has >>> a unique id and a value (mostly numeric types, so i convert it to >>> string). >>> There are some fixed fields. So the object is a list of fields which is >>> produced by a parser. >>> I am trying to index using two indexers in two separate threads- one for >>> fixed and another for the non-fixed fields. Except for a unique id, I do >>> not >>> store the fields in Lucene - i just index them. From the index, i get the >>> unique id which is all I care about. (the objects are stored elsewhere >>> and >>> can be looked up based on this unique id). >>> I did try using a single indexer, but things were quite slow. Getting >>> high >>> throughput is crucial and having two indexers seemed to do very well. >>> (more >>> than twice as fast) >>> >>> Further, the index will never be modified and I can have just one thread >>> writing to the index. If there are any other performance tips would be >>> very >>> helpful. I have already looked at the wiki link regarding performance and >>> using some of them. >>> >>> Thanks, >>> ~preetham >>> >>> >>> >>> >>>> Now, I want to have boolean AND query's looking for values in both >>>>> indexes. Like f1=1234 AND f7=ABCD.f1 and f7 and present in two separate >>>>> indexes. Would using the MultiIndexReader help ? Since I am doing an >>>>> AND, I >>>>> dont expect that it would work. >>>>> >>>>> Thanks, >>>>> ~preetham >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>>> >>>>> >>>>> >>>> -------------------------- >>>> Grant Ingersoll >>>> >>>> Lucene Helpful Hints: >>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance >>>> http://wiki.apache.org/lucene-java/LuceneFAQ >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>>> >>>> >>>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >>> >>> >> >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >