Beto,

Here is an idea I have been working on as a workaround:

Suppose you want to create a new document. The steps to do that are:
1. Insert the document into a "Pending Documents" table in the database.
2. Index the document with Lucene
3. Insert the document into the "Documents" table in the database, and remove it from the "Pending Documents" table in a single transaction.

Periodically delete from Lucene's index Documents found in the "Pending Documents" table.

Also, when returning results, filter out Documents found in the "Pending Documents" table.

Basically, the Pending Documents table stores the documents that have been indexed by Lucene but have not yet been inserted in the database. Note how if an error happens between steps 2 and 3, or in step 3, the document will be found in the Pending Documents table. So you are kind of implementing a rollback for the whole procedure by deleting whatever is found in this table. If everything goes well, you remove the Document from Pending Documents, and then you know it exists both in the database and Lucene's index.

Also, if an error happens after 1, you are simply left with an entry in the Pending Documents table which you can remove when you discover that there is no corresponding document in Lucene's index.

This is of course rather ad-hoc and does not generalize well to other types of queries (e.g. updates, etc). But it can be a viable workaround if you don't want the added complexity of efforts like Compass.

What do you think?

Marios Skounakis

----- Original Message ----- From: "Beto Siless" <[EMAIL PROTECTED]>
To: <java-user@lucene.apache.org>
Sent: Thursday, November 17, 2005 11:18 PM
Subject: Re: Lucene & Transactional semantics



Hi, I'm with the transaction problem too: I have Documents which are represented by a Business Object (persisted in a DB with an ORM), indexed with Lucene and finally stored in the file system. So it's very difficult to maintain the consistency in an error scenario. The main problem is that if you implement some ad-hoc transaction with Lucene (working in a RAMDirectory or keeping the commands to apply until the end), you still have to coordinate the lucene transaction with the others. Cause if lucene transaction rollbacks you can abort the db transaction, but if lucene transaction commits you can't do anything if the DB transaction fails with out a 3pc transaction manager. Does Anybody have an idea about how to reduce the error time window? Could this problem be solved storing the index in a database?
Thanks
Beto


Marios Skounakis wrote:
Hi all,

I am interested in developing a system which will use Lucene to implement the search functionality. A key characteristic of this system is that certain information about the indexed documents will be editable by the user administrators. For instance, the user administrators can manually create "document collections" and assign some of the indexed documents to them. One way to implement document collections would by having documents have a dedicated field for storing the document collection id, and storing the document collection information in a database.

Ideally, such an operation as the above should have transactional semantics, i.e. if a user wants to assign documents x, y and z to collection C, then either all three documents should be assigned to the collection or, in case of error, none of the documents should be assigned to the collection. Also, if the operation were to be followed by an SQL query to update the database with the number of documents assigned to collection C, that should be included in the "transaction" as well.

Is there a straightforward way to do this with Lucene? Or are "transactions" a no-no for a system like Lucene and I should just go ahead without having transactional semantics?

Thanks in advance,

Marios Skounakis


------------------------------------------------------------------------

No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.362 / Virus Database: 267.13.1/169 - Release Date: 11/15/2005

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to