Re: Lucene index on relational data

Karl Wettin Fri, 11 Apr 2008 16:04:07 -0700

How much data do you have? I have a hard time to understand therelationship between your objects and what sort of normalized data youadd to the documents.

If you are lucky it is just a single or few fields that needs to beupdated and you can manage to keep it in RAM and rebuild the whole thingeverytime something happends or on some schedule.

There are some hacks in the JIRA that allows you to replace a documentat a certain position at index optimization time. You might want toupdate a number of document every time you do that.


https://issues.apache.org/jira/browse/LUCENE-879

Rajesh parab skrev:

Thanks for details Karl.

I was looking for something like it. However, I have a
question around the warning mentioned in javadoc of

parallelReader.

It says -
It is up to you to make sure all indexes are created
and modified the same way. For example, if you add
documents to one index, you need to add the same
documents in the same order to the other indexes.
Failure to do so will result in undefined behavior.


So now, if I want to update one of the index document
from my dynamic index, I will have to delete the
document and insert it again as Lucene does not allow
updating the document. Correct? If this is the case,
re-insert of document in dynamic index will change the
order of the index with static index, which is not
modified. How should we take care of this situation?
Am I missing something here?

Regards,
Rajesh

--- Karl Wettin <[EMAIL PROTECTED]> wrote:

Hi Rajesh,

I think you are looking for ParallelReader.

<http://lucene.apache.org/java/2_0_0/api/org/apache/lucene/index/ParallelReader.html>

public class ParallelReader
extends IndexReader

An IndexReader which reads multiple, parallel

indexes. Each index addedmust have the same number of documents, buttypically each containsdifferent fields. Each document contains the unionof the fields of alldocuments with the same document number. Whensearching, matches for aquery term are from the first index added that has

the field.

This is useful, e.g., with collections that have

large fields whichchange rarely and small fields that change morefrequently. The smallerfields may be re-indexed in a new index and bothindexes may be searchedtogether.


Warning: It is up to you to make sure all indexes

are created andmodified the same way. For example, if you adddocuments to one index,you need to add the same documents in the same orderto the otherindexes. Failure to do so will result in undefined

behavior.



     karl

Rajesh parab skrev:

Hi,

We are using Lucene 2.0 to index data stored

inside

relational database. Like any relational database,

our

database has quite a few one-to-one and

one-to-many

relationships. For example, let’s say an Object A

has

one-to-many relationship with Object X and Object

Y.

As we need to de-normalize relational data as
key-value pairs before storing it inside Lucene

index,

we have de-normalized these relationships (Object

and Object Y) while building an index on Object A.

We have large no of such object relationships and

most

of the times, the related objects are modified

more

frequently than the base objects. For example, in

our

above case, objects X and Y are updated in the

system

very frequently, whereas Object A is not updated

that

often. Still, we will need to update Object A

entries

inside the index, every time its related objects X
and/or Y are modified.

To avoid the above situation, we were thinking of
having 2 separate indexes – first index will only
index data of base objects (Object A in above

example)

and second index will contain data about its
relationship objects (Object X and Y above), which

are

updated more frequently. This way, the more

frequent

updates to Object X and Y will only impact second
index that stores relationship information and

reduce

the cost to re-index object A. However, I don’t

think,

MultiSearcher will be helpful if we want to search

for

data which spans across both indexes (e.g. some

fields

of Object A in first index and some fields of

Object X

or Y in second index).

Do we have any option in Lucene to handle such
scenario? Can we search across multiple indexes

which

have some relationships between them and search

for

fields that span across these indexes?

Regards,
Rajesh

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam

protection around

http://mail.yahoo.com

---------------------------------------------------------------------

To unsubscribe, e-mail:

[EMAIL PROTECTED]

For additional commands, e-mail:

[EMAIL PROTECTED]

---------------------------------------------------------------------

To unsubscribe, e-mail:
[EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]



__________________________________________________
Do You Yahoo!?

Tired of spam? Yahoo! Mail has the best spam protection aroundhttp://mail.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene index on relational data

Reply via email to