hi erick, nr of events are 107/sec in average with 400/sec peak and 20/sec low. between searchable should be less than 20 minutes. we are planning to index IN RAM only for a duration of one day MAX. per lucene process on the operating system.
currently we need 500 M RAM for indexing one day (just storing the eventids and indexing (without storing) highly redundant event descriptions. collecting all eventdescriptions costs us additionally 3G ram (which is very much :-( for us.) @PositionIncrementGap or SpanQueries or the proximity operator ... sorry i am bloody beginner i don't really kow what you are talking about. a real update would be perfect... but i think from the current design it is not possible to extract all unstemmed keywords from a HIT? or is this possible? update then would be: search eventid get hits (should be one) extract all keywords from hit add new information plus hits newly to the index delete the hit. is there a possibility to gather detailed information about the index itself, that i can give you a detailed idea how big / and in which condition it is? regards chris On Wed, Feb 18, 2009 at 5:38 PM, Erick Erickson <erickerick...@gmail.com> wrote: > You could always sort by EVENTID, that way at least > you'd have all the events for a particular ID together > in your results. You'd have to post-filter the results to > determine whether all the necessary descriptions were > present. But I don't think this works all that well because, > as you pointed out, you may have a lot of records to > sort through so I don't think this is a very good idea... > > > > How many events are we talking about here and what > kind of lag between an event and being able to search it > can you tolerate? I guess what I'm really asking is whether > it's possible to recreate your index "often enough" to > satisfy your users. If so, you can index multiple > descriptions in a single document, something like > > doc.add("EVENTDESCRIPTION", "STARTING EVENT") > doc.add("EVENTDESCRIPTION", "XYZ") > doc.add("EVENTDESCRIPTION", "ABC") > doc.add("EVENTID", "1") > IndexWriter.addDocument(doc); > > > You'd have to gather all the descriptions related > to each EVENTID before you were able to index the doc..... > > By manipulating the PositionIncrementGap you could also > keep searches from matching across different EVENTDESCRIPTIONs, > e.g. if you didn't want to match +STARTING +ABC you could use > SpanQueries or the proximity operator, but going into details > depends upon whether you can rebuild your index so we'll defer > that part.... > > You could also think about updating the document when new events > were added, but since an update is really a delete/add under the > covers you'd have to either gather enough information from what I > assume is your log or store enough information with the document to > recreate it. > > How big is your index currently and what kind of throughput do you > require? > > Best > Erick > > > On Wed, Feb 18, 2009 at 10:20 AM, Christian Brennsteiner < > christ...@brennsteiner.at> wrote: > >> dear lucene community, >> >> i am playing around with lucene right now. and have come to very bad >> problem. >> >> given environment: >> >> a signal source gives signals with eventids ans eventdescriptions >> >> for example EVENTID=1 and EVENTDESCRIPTION="STARTING EVENT" >> >> those events can be running very long (e.g. one month) during this >> period we will receive for example >> >> EVENTID=1 and EVENTDESCRIPTION="EXECUTING XYZ" >> 10 minutes later >> EVENTID=1 and EVENTDESCRIPTION="EXECUTING YZA" >> 10 minutes later >> EVENTID=1 and EVENTDESCRIPTION="PASSED MILESTONE1" >> 10 minutes later >> EVENTID=1 and EVENTDESCRIPTION="EXECUTING ZAB" >> >> after e.g. 1 week we receive >> EVENTID=1 and EVENTDESCRIPTION="STOPING EVENT" >> >> what i want: >> i want to be able to search e.g. which eventids are connected to "XYZ" >> AND "ZAB" AND have already passed "MILESTONE1" >> >> so my current try is to index all events by full indexing (without >> storing) eventdescriptions AND stemming e.g. EXECUTING >> >> then searching for "+XYZ +ZAB +MILESTONE1" >> --> result no document since those are all seperated documents >> when i search >> "XYZ ZAB MILESTONE1" >> i am getting 3 times EVENTID 3 >> --> this is bad since when i get 1000000 of such events how do i rank them? >> >> CONCLUSION: >> my biggest problem is that my lucene document given to the index >> currently is not in a final state BUT i have to index and search it >> also while it is in progress. >> as a result of this the ranking as i do it now has no real value since >> the ranking is just based on a "line of a whole event" >> >> QUESTION: >> is there a solution within lucene to combine search results? e.g. merge >> them OR >> is there a better workaround how i would do such updates to the index >> without storing the original docmuent inside the index (since this >> consumes so many space)? e.g. extracting the keywords that were stored >> for the item? >> >> any hints appreciated. >> >> regards chris >> >> >> ---------- >> Christian Brennsteiner >> Salzburg / Austria / Europe >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > -- ---------- Christian Brennsteiner Salzburg / Austria / Europe --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org