I guess I don't understand what a commit lock is, or what's its purpose is. It seems the write lock is all that is needed.

If you still need a write lock, then what is the purpose of "lockless" commits.

You can get consistency if all writers get the write lock before performing any read. It would seem this should be the requirement???

Is there a Wiki or some such thing that discusses the "lockless commits", their purpose and their implementation? I find the email thread a bit cumbersome to review.


On Jan 23, 2008, at 11:55 AM, Michael McCandless wrote:


robert engels wrote:

Maybe I don't understand lockless commits then.

I just don't think you can enforce transactional consistency without either 1) locking, or 2) optimistic collision detection. I could be wrong here, but this has been my experience. By effectively removing the locking requirement, I think you are going to have users developing code without thought as to what is going to happen when locking is added. This is going to break the backwards compatibility that people are striving for.

Lucene still has locking (write.lock), to only allow one writer at a time to make changes to the index (ie, it serializes writer sessions). Lock-less commits just replaced the old "commit.lock".

The lucene "writer" structure needs to be something like:

start tx for update
do work
commit

where commit is composed of (prepare and commit phases), but commit may fail.

Right, this is what IndexWriter does now. It's just that with autoCommit=false you have total control on when that commit takes place (only on closing the writer).

It is unknown if this can actually happen though, since there is no unique ID that could cause collisions, but there is the internal id (which would need to remain constant throughout the tx in order for queries and delete operations to work).

Yes but there are other errors that Lucene may hit, like disk full, which must (and do) rollback the commit to the start of the transaction (ie, index state when writer was first opened).

I am sure it is that I don't understand lockless commits, so I will give a scenario.

client A issues query looking for documents with OID (a field) = "some field";
client B issues same query
both queries return nothing found
client A inserts document with OID = "some filed"
client B inserts document with OID = "some field"

client A commits and client B commits

unless B is blocked, once A issues the query, the index is going to end up with 2 different copies of the document.

I understand that Lucene is not a database, and has no concept of unique constraints. It is my understand that this has been overcome using locks and sequential access to the index when writing.

In a simple XA implementation, client A would open a SERIALIZABLE transaction, which would block B from even reading the index. Most simple XA implementation only support READ_COMMITTED, SERIALIZABLE, and NONE.

There are other ways of offering finer grained locking (based on internal id and timestamps), but most are going to need a "server based" implementation of lucene to pull off.

To summarize, I think the "shared filestore (NFS)" and "lockless commits" make implementing transactions very difficult. I am sure I am missing something here, I just don't see what.

Lucene hasn't ever supported that case above: it never blocks a reader from opening the index. But, you could easily build that on top of Lucene, right?

I'm still trying to understand what you feel is missing in the core that prevents you from building XA (or, your own transactions handling that involves another resource like a DB) on top of Lucene...

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to