One thing it took me a while to grasp, and is not automatic for folks with significant database backgrounds is that the fields in a Lucene document are only related to those of any other document by the meaning you, as a programmer, understand. That is, document 1 may have fields a, b, c. Document 2 may have fields b, e, g. There is no requirement that, in this example, document 1 has fields e and g for instance. and vice-versa. In other words, Lucene documents don't fit into a table model.
The reason I mention that is that I'm extremely leery of packing data in a field that really doesn't belong together. Plus, your searching becomes more complicated. In your example above, what happens if the file name and image are similar enough to produce false hits? Whereas if you stored them as separate fields in a document, you don't have this kind of problem. So, if you can cleverly de-normalize your data in such a way as to satisfy all the searches you'll ever want to perform, you can store it all in a Lucene index and be happy. If you can't, you could use Lucene to search the parts you *do* care about and store the rest in a database. Or, you could just use a database. I believe it all hinges on whether you have a fixed set of queries you can anticipate (and thus reflect in a Lucene index) or not. Best Erick On 11/2/06, Rajesh parab <[EMAIL PROTECTED]> wrote:
Thanks for feedback Chris. I agree with you. The data set should be flattened out to store inside Lucene index. The Folder-File was just an example. As you know, in relational database, we can have more complex relationships. I understand that this model may not work for deeper relationships. What I am mainly interested in is just one level deep relationship. But, I would like to search on the additional attributes of the related object. For example, in the relationship for Folder-File, I would like to use additional file attributes as search criteria along with file name while searching for folders. The way I see is having single filed for the related object and all its additional attributes and use some separator while capturing this data inside Lucene Field object. For example - new Field("file", "abc.txt<sep>image"); But, I am not quite sure if this model will work. BTW. I did not understand what you meant by the detached approach. Can you please elaborate? Regards, Rajesh ----- Original Message ---- From: Chris Lu <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, November 2, 2006 7:57:46 PM Subject: Re: Modelling relational data in Lucene Index? For this specific question, you can create index on files, search files that of type image, and from matched files, find the unique directories(can be done in lucene or you can do it via java). Of course this does not scale to deeper relationships. Usually you do need to flattern the database objects in order to use lucene. It's just trading space for speed. I would prefer a detached approach instead of Hibernate or EJB's approach, which is kind of too tightly coupled with any system. How to rebuild if the index is corrupted, or you have a new Analyzer, or schema evolves? How to make it multi-thread safe? -- Chris Lu ------------------------- Instant Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com On 11/2/06, Mark Miller <[EMAIL PROTECTED]> wrote: > Lucene is probably not the solution if you are looking for a relational > model. You should be using a database for that. If you want to combine > Lucene with a relational model, check out Hibernate and the new EJB > annotations that it supports...there is a cool little Lucene add-on that > lets you declare fields to be indexed (and how) with annotations. > > - Mark > > Rajesh parab wrote: > > Hi, > > > > As I understand, Lucene has a flat structure where you can define multiple fields inside the document. There is no relationship between any field. > > > > I would like to enable index based search for some of the components inside relational database. For exmaple, let say "Folder" Object. The Folder object can have relationship with File object. The File object, in turn, can have attributes like is image, is text file, etc. So, the stricture is > > > > Folder -- > File > > | > > ------- > is image, is text file, ...... > > > > > > I would like to enable a search to find a Folder with File of type image. How can we model such relational data inside Lucene index? > > > > Regards, > > Rajesh > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]