One thing it took me a while to grasp, and is not automatic for folks with
significant database backgrounds is that the fields in a Lucene document are
only related to those of any other document by the meaning you, as a
programmer, understand. That is, document 1 may have fields a, b, c.
Document 2 may have fields b, e, g. There is no requirement that, in this
example, document 1 has fields e and g for instance. and vice-versa. In
other words, Lucene documents don't fit into a table model.

The reason I mention that is that I'm extremely leery of packing data in a
field that really doesn't belong together. Plus, your searching becomes more
complicated.

In your example above, what happens if the file name and image are similar
enough to produce false hits? Whereas if you stored them as separate fields
in a document, you don't have this kind of problem.

So, if you can cleverly de-normalize your data in such a way as to satisfy
all the searches you'll ever want to perform, you can store it all in a
Lucene index and be happy. If you can't, you could use Lucene to search the
parts you *do* care about and store the rest in a database. Or, you could
just use a database. I believe it all hinges on whether you have a fixed set
of queries you can anticipate (and thus reflect in a Lucene index) or not.

Best
Erick

On 11/2/06, Rajesh parab <[EMAIL PROTECTED]> wrote:

Thanks for feedback Chris.

I agree with you. The data set should be flattened out to store inside
Lucene index. The Folder-File was just an example. As you know, in
relational database, we can have more complex relationships. I understand
that this model may not work for deeper relationships.

What I am mainly interested in is just one level deep relationship. But, I
would like to search on the additional attributes of the related object. For
example, in the relationship for Folder-File, I would like to use additional
file attributes as search criteria along with file name while searching for
folders.

The way I see is having single filed for the related object and all its
additional attributes and use some separator while capturing this data
inside Lucene Field object. For example -

            new Field("file", "abc.txt<sep>image");

But, I am not quite sure if this model will work.

BTW. I did not understand what you meant by the detached approach. Can you
please elaborate?

Regards,
Rajesh

----- Original Message ----
From: Chris Lu <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, November 2, 2006 7:57:46 PM
Subject: Re: Modelling relational data in Lucene Index?


For this specific question, you can create index on files, search
files that of type image, and from matched files, find the unique
directories(can be done in lucene or you can do it via java).

Of course this does not scale to deeper relationships. Usually you do
need to flattern the database objects in order to use lucene. It's
just trading space for speed.

I would prefer a detached approach instead of Hibernate or EJB's
approach, which is kind of too tightly coupled with any system. How to
rebuild if the index is corrupted, or you have a new Analyzer, or
schema evolves? How to make it multi-thread safe?

--
Chris Lu
-------------------------
Instant Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com

On 11/2/06, Mark Miller <[EMAIL PROTECTED]> wrote:
> Lucene is probably not the solution if you are looking for a relational
> model. You should be using a database for that. If you want to combine
> Lucene with a relational model, check out Hibernate and the new EJB
> annotations that it supports...there is a cool little Lucene add-on that
> lets you declare fields to be indexed (and how) with annotations.
>
> - Mark
>
> Rajesh parab wrote:
> > Hi,
> >
> > As I understand, Lucene has a flat structure where you can define
multiple fields inside the document. There is no relationship between any
field.
> >
> > I would like to enable index based search for some of the components
inside relational database. For exmaple, let say "Folder" Object. The Folder
object can have relationship with File object. The File object, in turn, can
have attributes like is image, is text file, etc. So, the stricture is
> >
> >     Folder -- > File
> >              |
> >              ------- > is image, is text file, ......
> >
> >
> > I would like to enable a search to find a Folder with File of type
image. How can we model such relational data inside Lucene index?
> >
> > Regards,
> > Rajesh
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to