Re: Back Compatibility

robert engels Wed, 23 Jan 2008 08:55:07 -0800

Maybe I don't understand lockless commits then.

I just don't think you can enforce transactional consistency withouteither 1) locking, or 2) optimistic collision detection. I could bewrong here, but this has been my experience.

By effectively removing the locking requirement, I think you aregoing to have users developing code without thought as to what isgoing to happen when locking is added. This is going to break thebackwards compatibility that people are striving for.


The lucene "writer" structure needs to be something like:

start tx for update
do work
commit

where commit is composed of (prepare and commit phases), but commitmay fail.

It is unknown if this can actually happen though, since there is nounique ID that could cause collisions, but there is the internal id(which would need to remain constant throughout the tx in order forqueries and delete operations to work).

I am sure it is that I don't understand lockless commits, so I willgive a scenario.

client A issues query looking for documents with OID (a field) ="some field";

client B issues same query
both queries return nothing found
client A inserts document with OID = "some filed"
client B inserts document with OID = "some field"

client A commits and client B commits

unless B is blocked, once A issues the query, the index is going toend up with 2 different copies of the document.

I understand that Lucene is not a database, and has no concept ofunique constraints. It is my understand that this has been overcomeusing locks and sequential access to the index when writing.

In a simple XA implementation, client A would open a SERIALIZABLEtransaction, which would block B from even reading the index. Mostsimple XA implementation only support READ_COMMITTED, SERIALIZABLE,and NONE.

There are other ways of offering finer grained locking (based oninternal id and timestamps), but most are going to need a "serverbased" implementation of lucene to pull off.

To summarize, I think the "shared filestore (NFS)" and "locklesscommits" make implementing transactions very difficult. I am sure Iam missing something here, I just don't see what.


On Jan 23, 2008, at 8:53 AM, Mark Miller wrote:

Thats where Robert is confusing me as well. To have XA support youjust need to be able to define a transaction, atomically commit, orrollback. You also need a consistent state after any of theseoperations. LUCENE-1044 seems to guarantee that, and so isn't itmore like finishing up needed work than going down the wrong path?It seems more to me (and obviously I know a lot less about thisthan either of you) that you have just gotten Lucene ready to addXA support. Lucene now fulfills all of the requirements. No?Someone just needs to write a boatload of JTA code :)
It would seem the next step would be, as Robert suggests, to make atransaction a first class citizen. The XA protocol will requireLucene to communicate with the TM about what transactions it hascompleted to help in failure recovery and transaction management. Ican certainly see the need for a better transaction abstraction tohelp with this.
A little enlightenment on this would be great robert. I am veryinterested in it for future projects.
And I have to point out...it just seems logical that we would makethings so that the index was consistent at some point before takingthe next step of making it consistent with other resources...no? Iam just still confused about Roberts objections to what is going onhere. I think that it would be a real leap forward to get it donethough.
Also, as he mentioned, we really need a good distributed systemthat allows for index partitioning. Thats the ticket to moreenterprise adoption. Could be Solr's work though...
Michael McCandless wrote:
Robert, besides LUCENE-1044 (syncing on commit), what is the Lucene
core missing in order for you (or, someone) to build XA compliance on
top of it?

Ie, you can open a writer with autoCommit=false and no changes are
committed until you close it.  You can abort the session by calling
writer.abort().  What's still missing, besides LUCENE-1044?

Mike

robert engels wrote:
One more example on this. A lot of work was done on transactionsupport. I would argue that this falls way short of what isneeded, since there is no XA transaction support. Since thelucene index (unless stored in an XA db) is a separate resource,it really needs XA support in order to be consistent with theother resources.
All of the transaction work that has been performed onlyguarantees that barring a physical hardware failure the luceneindex can be opened and used at a known state. This index thoughis probably not consistent with the other resources.
All that was done is that we can now guarantee that the index isconsistent at SOME point in time.
Given the work that was done, we are probably closer to adding XAsupport, but I think this would be much easier if the concept ofa transaction was made first class through the API (and then XAtransactions need to be supported).
On Jan 22, 2008, at 2:49 PM, robert engels wrote:
I don't think group C is interested in bug fixes. I just don'tsee how Lucene is at all useful if the users are encounteringany bug - so they either don't use that feature, or they havealready developed a work-around (or they have patched the codein a way that avoids the bug, yet is specific to theirenvironment).
For example, I think the NFS work (bugs, fixes, etc.) was quitesubstantial. I think the actual number of people trying to useNFS is probably very low - as the initial implementation had somany problems (and IMO is not a very good solution fordistributed indexes anyway). So all the work in trying to makeNFS work "correctly" behind the scenes may have beeninefficient, since a more direct, yet major fix may have solvedthe problem better (like distributed server support, not sharedindex access).
I just think that trying to maintain API compatibility throughmajor releases is a bad idea. Leads to bloat, and complex code -both internal and external. In order to achieve great gains inusability and/or performance in a mature product like Lucenealmost certainly requires massive changes to the processes,algorithms and structures, and the API should change as well toreflect this.
On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:
: If they are " no longer actively developing the portion ofthe code that's: broken, aren't seeking the new feature, etc", and they stayback on old: versions... isn't that exactly what we want? They can stay onthe old version,
: and new application development uses the newer version.

This basically mirrors a philosophy that is rising in the Perl
community evangelized by (a really smart dude named chromatic) ...
"why are we worry about the effect of upgrades on users whodon't upgrade?"
The problem is not all users are created equal and not allusers upgrade
for the same reasons or at the same time...
Group A: If someone is paranoid about upgrading, and is stillrunninglucene1.4.3 because they are afraid if they upgrade their appwill breakand they don't want to deal with it; they don't care aboutknown bugs inlucene1.4.3, as long as those bugs haven't impacted them yet --thesepeople aren't going to care wether we add a bunch of newmethods tointerfaces, or remove a bunch of public methods from arbitraryreleases,because they are never going to see them. They might do atotal rewriteof their project later, and they'll worry about it then (whenthey have
lots of time and QA resources)
Group: B: At the other extreme, are the "free-spirited"developers (god ihate that that the word "agile" has been co-opted) who arealways eager toupgrade to get the latest bells and whistles, and don't mindmakingchanges to code and recompiling everytime they upgrades -- justas long as
there are some decent docs on what to change.
Croup: C: In the middle is a larg group of people who areinterested inupgrading, who want bug fixes, are willing to write new code totake
advantage of new features, in some cases are even willing to make
small or medium changes their code to get really good performance
improvements ... but they don't have a lot of time or energy toconstantlyrewrite big chunks of their app. For these people, knowingthat they can"drop in" the new version and it will work is a big reason whythere are
willing to upgrade, and why they are willing to spend soem time
tweaking code to take advantage of the new features and the new
performacne enhaced APIs -- becuase they don't have to spend alot of time
just to get the app working as well as it was before.

To draw an analogy...
Group A will stand in one place for a really long time nomatter how easythe path is. Once in a great while they will decide to marchforward
dozens of miles in one big push, but only once they feel they have
adequate resources to make the entire trip at once.
Group B likes to frolic, and will happily take two sptensbackward and
then 3 steps forward every day.
Group C will walk forward with you at a steady pace, andoccasionally eventake a step back before moving forward, but only if the path isclear and
not very steap.
: I bet, if you did a poll of all Lucene users, you would finda majority of: them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3,or 3.0, that is
: still going to be the case.

That's probably true, but a nice perk of our current backwards
compatibility commitments is that when people pop up askingquestions
about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
problem" and that advice isn't a death sentence -- the steps tomove
forward are small and easy.
I look at things the way things like Maven v1 vs v2 worked out,and howthat fractured the community for a long time (as far as i cantell it'sstill pretty fractured) because the path from v1 to v2 was sosteep andinvolved backtracking so much and i worry that if we makechanges to our"copatibility pledge" that don't allow for an even forwardwalk, we'll
wind up with a heavily fractured community.



-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Back Compatibility

Reply via email to