Hi there everyone

Just throwing my 5c in the ring (again). Not sure how much use this is
going to be, but it might be.... ignore freely :)

I've used Lucene since v1 (Java) and Lucene.Net since v1.something (I
think) in various projects. Actually, in any project I can get it into
:) Quest Archive Manager (then called AfterMail) was the first one in
.NET (which Quest bought for about $40m, not from me sadly).
www.topgear.com (search and related items calculations), ComArchive
(who I'm working with these days), and a few other places. At a guess,
annual revenue from projects _I've_ worked on which use Lucene.Net as
a core component: $10m+/year. And I'm just one person.

BTW, most of the indexes I've worked with are in the 100's of GB's
range. It's performed amazingly. I'm not going to count Umbraco, which
I've been working on of late, as it's kinda invisible in there. And
thats part of the problem I think.

The biggest "problems" I've had with Lucene.Net are two things:

1. The code is dense and very complex. I'm not a search theorist, so a
lot of it - even just the terms - makes no sense to me at all. The API
is fine, but once I try to dig into it, I get lost in a maze of term
vectors and other stuff. Not L.N's fault - or the porters - it's just
a complex piece of software! I never thought to contribute because,
basically, it was like looking into a big black hole.

2. For me, it's been stable. VERY VERY stable. Like my Mac, it "just
works". I've had a few problems from time to time (eg trying to do
multithreaded access with v1.x :) ), and I've had to rebuild a few of
those huge indexes (queue a week's downtime for a client), but 99% of
the time (in production anyway), it's been prefect. So I've not
thought to contribute, because I didn't have an itch to scratch. I
know one of my co-workers has, and I think he's sent in patches (or
worked closely with the maintainers), but generally, it's been a
highly functional, highly stable black box.

3. The website (well, the incubator one). I find it impossible to find
anything, impossible to find news (why is there no official release?
oh, _thats_ why... etc (or not)) etc. I find this true of all the
apache sites/projects I've used or tried to use, so this is not a
lucene thing. Maybe I was just looking in the wrong place, or I dont
get the way the pages are structured, but thats how it is for me. This
would be the area I'd be looking at once I get back from being away
(mid-February) unless someone else has done it.

So, once I get back, I'm going to do something about 3. Maybe the
solution to 1 and 2 is to get a port, ala NGit (ie using something
like Sharpen), working 95% automatically, making each Java release 1-2
weeks work, not 3+ months. That might require changes to Sharpen, but
if we start with the code from the NGit guy, it might help a lot.

Once thats there, with passing tests etc, then build a .NET-style
layer on top for those who need it or want it, and so one team/person
can port the java, while others maintain the layer (ala NANT +
Nant.contrib)

Righto. Thought over.

:)

Nic

Reply via email to