Maybe I don't understand lockless commits then.
I just don't think you can enforce transactional consistency without
either 1) locking, or 2) optimistic collision detection. I could be
wrong here, but this has been my experience.
By effectively removing the locking requirement, I think you are
going to have users developing code without thought as to what is
going to happen when locking is added. This is going to break the
backwards compatibility that people are striving for.
The lucene "writer" structure needs to be something like:
start tx for update
do work
commit
where commit is composed of (prepare and commit phases), but commit
may fail.
It is unknown if this can actually happen though, since there is no
unique ID that could cause collisions, but there is the internal id
(which would need to remain constant throughout the tx in order for
queries and delete operations to work).
I am sure it is that I don't understand lockless commits, so I will
give a scenario.
client A issues query looking for documents with OID (a field) =
"some field";
client B issues same query
both queries return nothing found
client A inserts document with OID = "some filed"
client B inserts document with OID = "some field"
client A commits and client B commits
unless B is blocked, once A issues the query, the index is going to
end up with 2 different copies of the document.
I understand that Lucene is not a database, and has no concept of
unique constraints. It is my understand that this has been overcome
using locks and sequential access to the index when writing.
In a simple XA implementation, client A would open a SERIALIZABLE
transaction, which would block B from even reading the index. Most
simple XA implementation only support READ_COMMITTED, SERIALIZABLE,
and NONE.
There are other ways of offering finer grained locking (based on
internal id and timestamps), but most are going to need a "server
based" implementation of lucene to pull off.
To summarize, I think the "shared filestore (NFS)" and "lockless
commits" make implementing transactions very difficult. I am sure I
am missing something here, I just don't see what.
On Jan 23, 2008, at 8:53 AM, Mark Miller wrote:
Thats where Robert is confusing me as well. To have XA support you
just need to be able to define a transaction, atomically commit, or
rollback. You also need a consistent state after any of these
operations. LUCENE-1044 seems to guarantee that, and so isn't it
more like finishing up needed work than going down the wrong path?
It seems more to me (and obviously I know a lot less about this
than either of you) that you have just gotten Lucene ready to add
XA support. Lucene now fulfills all of the requirements. No?
Someone just needs to write a boatload of JTA code :)
It would seem the next step would be, as Robert suggests, to make a
transaction a first class citizen. The XA protocol will require
Lucene to communicate with the TM about what transactions it has
completed to help in failure recovery and transaction management. I
can certainly see the need for a better transaction abstraction to
help with this.
A little enlightenment on this would be great robert. I am very
interested in it for future projects.
And I have to point out...it just seems logical that we would make
things so that the index was consistent at some point before taking
the next step of making it consistent with other resources...no? I
am just still confused about Roberts objections to what is going on
here. I think that it would be a real leap forward to get it done
though.
Also, as he mentioned, we really need a good distributed system
that allows for index partitioning. Thats the ticket to more
enterprise adoption. Could be Solr's work though...
Michael McCandless wrote:
Robert, besides LUCENE-1044 (syncing on commit), what is the Lucene
core missing in order for you (or, someone) to build XA compliance on
top of it?
Ie, you can open a writer with autoCommit=false and no changes are
committed until you close it. You can abort the session by calling
writer.abort(). What's still missing, besides LUCENE-1044?
Mike
robert engels wrote:
One more example on this. A lot of work was done on transaction
support. I would argue that this falls way short of what is
needed, since there is no XA transaction support. Since the
lucene index (unless stored in an XA db) is a separate resource,
it really needs XA support in order to be consistent with the
other resources.
All of the transaction work that has been performed only
guarantees that barring a physical hardware failure the lucene
index can be opened and used at a known state. This index though
is probably not consistent with the other resources.
All that was done is that we can now guarantee that the index is
consistent at SOME point in time.
Given the work that was done, we are probably closer to adding XA
support, but I think this would be much easier if the concept of
a transaction was made first class through the API (and then XA
transactions need to be supported).
On Jan 22, 2008, at 2:49 PM, robert engels wrote:
I don't think group C is interested in bug fixes. I just don't
see how Lucene is at all useful if the users are encountering
any bug - so they either don't use that feature, or they have
already developed a work-around (or they have patched the code
in a way that avoids the bug, yet is specific to their
environment).
For example, I think the NFS work (bugs, fixes, etc.) was quite
substantial. I think the actual number of people trying to use
NFS is probably very low - as the initial implementation had so
many problems (and IMO is not a very good solution for
distributed indexes anyway). So all the work in trying to make
NFS work "correctly" behind the scenes may have been
inefficient, since a more direct, yet major fix may have solved
the problem better (like distributed server support, not shared
index access).
I just think that trying to maintain API compatibility through
major releases is a bad idea. Leads to bloat, and complex code -
both internal and external. In order to achieve great gains in
usability and/or performance in a mature product like Lucene
almost certainly requires massive changes to the processes,
algorithms and structures, and the API should change as well to
reflect this.
On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:
: If they are " no longer actively developing the portion of
the code that's
: broken, aren't seeking the new feature, etc", and they stay
back on old
: versions... isn't that exactly what we want? They can stay on
the old version,
: and new application development uses the newer version.
This basically mirrors a philosophy that is rising in the Perl
community evangelized by (a really smart dude named chromatic) ...
"why are we worry about the effect of upgrades on users who
don't upgrade?"
The problem is not all users are created equal and not all
users upgrade
for the same reasons or at the same time...
Group A: If someone is paranoid about upgrading, and is still
running
lucene1.4.3 because they are afraid if they upgrade their app
will break
and they don't want to deal with it; they don't care about
known bugs in
lucene1.4.3, as long as those bugs haven't impacted them yet --
these
people aren't going to care wether we add a bunch of new
methods to
interfaces, or remove a bunch of public methods from arbitrary
releases,
because they are never going to see them. They might do a
total rewrite
of their project later, and they'll worry about it then (when
they have
lots of time and QA resources)
Group: B: At the other extreme, are the "free-spirited"
developers (god i
hate that that the word "agile" has been co-opted) who are
always eager to
upgrade to get the latest bells and whistles, and don't mind
making
changes to code and recompiling everytime they upgrades -- just
as long as
there are some decent docs on what to change.
Croup: C: In the middle is a larg group of people who are
interested in
upgrading, who want bug fixes, are willing to write new code to
take
advantage of new features, in some cases are even willing to make
small or medium changes their code to get really good performance
improvements ... but they don't have a lot of time or energy to
constantly
rewrite big chunks of their app. For these people, knowing
that they can
"drop in" the new version and it will work is a big reason why
there are
willing to upgrade, and why they are willing to spend soem time
tweaking code to take advantage of the new features and the new
performacne enhaced APIs -- becuase they don't have to spend a
lot of time
just to get the app working as well as it was before.
To draw an analogy...
Group A will stand in one place for a really long time no
matter how easy
the path is. Once in a great while they will decide to march
forward
dozens of miles in one big push, but only once they feel they have
adequate resources to make the entire trip at once.
Group B likes to frolic, and will happily take two sptens
backward and
then 3 steps forward every day.
Group C will walk forward with you at a steady pace, and
occasionally even
take a step back before moving forward, but only if the path is
clear and
not very steap.
: I bet, if you did a poll of all Lucene users, you would find
a majority of
: them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3,
or 3.0, that is
: still going to be the case.
That's probably true, but a nice perk of our current backwards
compatibility commitments is that when people pop up asking
questions
about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
problem" and that advice isn't a death sentence -- the steps to
move
forward are small and easy.
I look at things the way things like Maven v1 vs v2 worked out,
and how
that fractured the community for a long time (as far as i can
tell it's
still pretty fractured) because the path from v1 to v2 was so
steep and
involved backtracking so much and i worry that if we make
changes to our
"copatibility pledge" that don't allow for an even forward
walk, we'll
wind up with a heavily fractured community.
-Hoss
------------------------------------------------------------------
---
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-------------------------------------------------------------------
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]