Re: Lucene index on relational data

Karl Wettin Fri, 11 Apr 2008 11:14:16 -0700

Hi Rajesh,

I think you are looking for ParallelReader.


<http://lucene.apache.org/java/2_0_0/api/org/apache/lucene/index/ParallelReader.html>

public class ParallelReader
extends IndexReader

An IndexReader which reads multiple, parallel indexes. Each index addedmust have the same number of documents, but typically each containsdifferent fields. Each document contains the union of the fields of alldocuments with the same document number. When searching, matches for aquery term are from the first index added that has the field.

This is useful, e.g., with collections that have large fields whichchange rarely and small fields that change more frequently. The smallerfields may be re-indexed in a new index and both indexes may be searchedtogether.

Warning: It is up to you to make sure all indexes are created andmodified the same way. For example, if you add documents to one index,you need to add the same documents in the same order to the otherindexes. Failure to do so will result in undefined behavior.




    karl

Rajesh parab skrev:

Hi,

We are using Lucene 2.0 to index data stored inside
relational database. Like any relational database, our
database has quite a few one-to-one and one-to-many
relationships. For example, let’s say an Object A has
one-to-many relationship with Object X and Object Y.
As we need to de-normalize relational data as
key-value pairs before storing it inside Lucene index,
we have de-normalized these relationships (Object X
and Object Y) while building an index on Object A.

We have large no of such object relationships and most
of the times, the related objects are modified more
frequently than the base objects. For example, in our
above case, objects X and Y are updated in the system
very frequently, whereas Object A is not updated that
often. Still, we will need to update Object A entries
inside the index, every time its related objects X
and/or Y are modified.

To avoid the above situation, we were thinking of
having 2 separate indexes – first index will only
index data of base objects (Object A in above example)
and second index will contain data about its
relationship objects (Object X and Y above), which are
updated more frequently. This way, the more frequent
updates to Object X and Y will only impact second
index that stores relationship information and reduce
the cost to re-index object A. However, I don’t think,
MultiSearcher will be helpful if we want to search for
data which spans across both indexes (e.g. some fields
of Object A in first index and some fields of Object X
or Y in second index).

Do we have any option in Lucene to handle such
scenario? Can we search across multiple indexes which
have some relationships between them and search for
fields that span across these indexes?

Regards,
Rajesh

__________________________________________________
Do You Yahoo!?

Tired of spam? Yahoo! Mail has the best spam protection aroundhttp://mail.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene index on relational data

Reply via email to