Beto,
Here is an idea I have been working on as a workaround:
Suppose you want to create a new document. The steps to do that are:
1. Insert the document into a "Pending Documents" table in the database.
2. Index the document with Lucene
3. Insert the document into the "Documents" table in the database, and
remove it from the "Pending Documents" table in a single transaction.
Periodically delete from Lucene's index Documents found in the "Pending
Documents" table.
Also, when returning results, filter out Documents found in the "Pending
Documents" table.
Basically, the Pending Documents table stores the documents that have been
indexed by Lucene but have not yet been inserted in the database. Note how
if an error happens between steps 2 and 3, or in step 3, the document will
be found in the Pending Documents table. So you are kind of implementing a
rollback for the whole procedure by deleting whatever is found in this
table. If everything goes well, you remove the Document from Pending
Documents, and then you know it exists both in the database and Lucene's
index.
Also, if an error happens after 1, you are simply left with an entry in the
Pending Documents table which you can remove when you discover that there is
no corresponding document in Lucene's index.
This is of course rather ad-hoc and does not generalize well to other types
of queries (e.g. updates, etc). But it can be a viable workaround if you
don't want the added complexity of efforts like Compass.
What do you think?
Marios Skounakis
----- Original Message -----
From: "Beto Siless" <[EMAIL PROTECTED]>
To: <java-user@lucene.apache.org>
Sent: Thursday, November 17, 2005 11:18 PM
Subject: Re: Lucene & Transactional semantics
Hi, I'm with the transaction problem too: I have Documents which are
represented by a Business Object (persisted in a DB with an ORM), indexed
with Lucene and finally stored in the file system. So it's very difficult
to maintain the consistency in an error scenario.
The main problem is that if you implement some ad-hoc transaction with
Lucene (working in a RAMDirectory or keeping the commands to apply until
the end), you still have to coordinate the lucene transaction with the
others. Cause if lucene transaction rollbacks you can abort the db
transaction, but if lucene transaction commits you can't do anything if
the DB transaction fails with out a 3pc transaction manager.
Does Anybody have an idea about how to reduce the error time window? Could
this problem be solved storing the index in a database?
Thanks
Beto
Marios Skounakis wrote:
Hi all,
I am interested in developing a system which will use Lucene to implement
the search functionality. A key characteristic of this system is that
certain information about the indexed documents will be editable by the
user administrators. For instance, the user administrators can manually
create "document collections" and assign some of the indexed documents to
them. One way to implement document collections would by having documents
have a dedicated field for storing the document collection id, and
storing the document collection information in a database.
Ideally, such an operation as the above should have transactional
semantics, i.e. if a user wants to assign documents x, y and z to
collection C, then either all three documents should be assigned to the
collection or, in case of error, none of the documents should be assigned
to the collection. Also, if the operation were to be followed by an SQL
query to update the database with the number of documents assigned to
collection C, that should be included in the "transaction" as well.
Is there a straightforward way to do this with Lucene? Or are
"transactions" a no-no for a system like Lucene and I should just go
ahead without having transactional semantics?
Thanks in advance,
Marios Skounakis
------------------------------------------------------------------------
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.362 / Virus Database: 267.13.1/169 - Release Date:
11/15/2005
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]