Hi there everyone Just throwing my 5c in the ring (again). Not sure how much use this is going to be, but it might be.... ignore freely :)
I've used Lucene since v1 (Java) and Lucene.Net since v1.something (I think) in various projects. Actually, in any project I can get it into :) Quest Archive Manager (then called AfterMail) was the first one in .NET (which Quest bought for about $40m, not from me sadly). www.topgear.com (search and related items calculations), ComArchive (who I'm working with these days), and a few other places. At a guess, annual revenue from projects _I've_ worked on which use Lucene.Net as a core component: $10m+/year. And I'm just one person. BTW, most of the indexes I've worked with are in the 100's of GB's range. It's performed amazingly. I'm not going to count Umbraco, which I've been working on of late, as it's kinda invisible in there. And thats part of the problem I think. The biggest "problems" I've had with Lucene.Net are two things: 1. The code is dense and very complex. I'm not a search theorist, so a lot of it - even just the terms - makes no sense to me at all. The API is fine, but once I try to dig into it, I get lost in a maze of term vectors and other stuff. Not L.N's fault - or the porters - it's just a complex piece of software! I never thought to contribute because, basically, it was like looking into a big black hole. 2. For me, it's been stable. VERY VERY stable. Like my Mac, it "just works". I've had a few problems from time to time (eg trying to do multithreaded access with v1.x :) ), and I've had to rebuild a few of those huge indexes (queue a week's downtime for a client), but 99% of the time (in production anyway), it's been prefect. So I've not thought to contribute, because I didn't have an itch to scratch. I know one of my co-workers has, and I think he's sent in patches (or worked closely with the maintainers), but generally, it's been a highly functional, highly stable black box. 3. The website (well, the incubator one). I find it impossible to find anything, impossible to find news (why is there no official release? oh, _thats_ why... etc (or not)) etc. I find this true of all the apache sites/projects I've used or tried to use, so this is not a lucene thing. Maybe I was just looking in the wrong place, or I dont get the way the pages are structured, but thats how it is for me. This would be the area I'd be looking at once I get back from being away (mid-February) unless someone else has done it. So, once I get back, I'm going to do something about 3. Maybe the solution to 1 and 2 is to get a port, ala NGit (ie using something like Sharpen), working 95% automatically, making each Java release 1-2 weeks work, not 3+ months. That might require changes to Sharpen, but if we start with the code from the NGit guy, it might help a lot. Once thats there, with passing tests etc, then build a .NET-style layer on top for those who need it or want it, and so one team/person can port the java, while others maintain the layer (ala NANT + Nant.contrib) Righto. Thought over. :) Nic