: In addition to the full text search, I'd like to be able to perform searches : such as: : - list sessions from:xxx timestamp:200808* : - list sessions (from:xxx OR from:yyy) : - etc : : Would it be better to store each message as a separate document with its : fields, adding the 'filename' (session identifier) as an extra field? or : maybe is there a better way of doing it making the session file a document?
As a general rule of thumb, you make 1 document for each result you want to get back when you execute a search ... if you want to be able to search for "foo" and get back a list of all sessions where the word "foo" was used, then each session should be a document. If you also want to be able to search for "foo" and get back a list of each message thta contained the word "foo", then each message can also be a document -- either in another index, or even in the same index (here's no rule that says all documents must have the same fields) BTW: If you are planning on experimenting with the Java API, i would suggest sending any specific followup questions to the [EMAIL PROTECTED] list. But you may also want to consider checking out Solr, and the solr-user list. Depends on what level of abstraction you want to deal with (Solr provides a config based web service type front end for dealing with Lucene indexes, but also has a Java API both for indexing and for hoooking in custom functionality when executing searches) -Hoss
