Re: RFO- Indexing 'meaningfull' xml

Chris Hostetter Wed, 06 Aug 2008 16:00:52 -0700

: In addition to the full text search, I'd like to be able to perform searches
: such as:
:  - list sessions from:xxx timestamp:200808*
:  - list sessions (from:xxx OR from:yyy)
:  - etc
: 
: Would it be better to store each message as a separate document with its
: fields, adding the 'filename' (session identifier) as an extra field? or
: maybe is there a better way of doing it making the session file a document?


As a general rule of thumb, you make 1 document for each result you want 
to get back when you execute a search ... if you want to be able to search 
for "foo" and get back a list of all sessions where the word "foo" was 
used, then each session should be a document.  If you also want to be able 
to search for "foo" and get back a list of each message thta contained the 
word "foo", then each message can also be a document -- either in another 
index, or even in the same index (here's no rule that says all documents 
must have the same fields)

BTW: If you are planning on experimenting with the Java API, i would 
suggest sending any specific followup questions to the [EMAIL PROTECTED] 
list.  But you may also want to consider checking out Solr, and the 
solr-user list.  Depends on what level of abstraction you want to deal 
with (Solr provides a config based web service type front end for dealing 
with Lucene indexes, but also has a Java API both for indexing and for 
hoooking in custom functionality when executing searches)


-Hoss

Re: RFO- Indexing 'meaningfull' xml

Reply via email to