Hello! This is a Request for Opinion targeted for the Lucene experts out there :-)
I'm trying to get to know Lucene a bit better: After having played with the 'getting started', I moved onto trying indexing of xml files. The simple (?) project would be to index chat sessions, each session stored in a file and containing many entries of the form: <message type="incoming_privateMessage" timestamp="200808021312" to="someone%40domain1%2Ecom" from="someoneelse%40domain2%2Ecom"><body>Hello</body></message> (it's jabber-client protocol with timestamp) In addition to the full text search, I'd like to be able to perform searches such as: - list sessions from:xxx timestamp:200808* - list sessions (from:xxx OR from:yyy) - etc Would it be better to store each message as a separate document with its fields, adding the 'filename' (session identifier) as an extra field? or maybe is there a better way of doing it making the session file a document? All comments appreciated, thanks! :-) PS: Of course, the actual goal isn't to index chat history (there are many chat searches available) but use this to learn the API ;-)
