Hi everyone! This is my first post here and I'm new to Lucene, so I would appreciate your ideas with the design of lucene document I came up with.
*What is my goal* I'm trying to index the collection of xml documents and all have the same structure like this: Each <section> tag can itself have <sections> tag which itself has <section> tags and so on. The maximum depth is 3. <doc> <title> </title> <sections> <section> <title> <text> </section> </sections> </doc> So, I figured out to have these separate fields: "pageTitle" - doc/title "sectionTitle" - doc/sections/section/title "sectionText" - doc/sections/section/text "subSectionTitle" - doc/sections/section/sections/section/title "subSectionText" - doc/sections/section/sections/section/text "subSubSectionTitle" - ... "subSubSectionText" - ... Currently, as I index, each document is a separate sectiontext, sectiontitle or sub things, but they all have the same pageTitle field of course. For searching, is that the good approach to index the document? I will describe below *how I'm going to search*; The real page/document structure is like this: pageTitle is the disease name and e.g sectionTitle can be "Definition" or "Treatment" or something like that. So, when the user asks a question like: "What are the treatments for "x" disease?" - I'm classifying that the questions is "treatment" type, so I would like to search the disease name in lucene index, but I would like to specifically retrieve the section of which title is "treatment". Is that the good indexing approach? And also, how would you recommend me to construct a query for searching, because I want to give disease name more importance and type ("treatment") relatively less. Thanks in advance! -- View this message in context: http://lucene.472066.n3.nabble.com/Help-with-document-design-for-indexing-searching-tp4075228.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org