All, About a year and a half ago I ported Lucene 2.9.x to C# and kept it up to date until ~3.0. I didn't plan to ever release the code; however, today I've done so on Codeplex using the name Lucille ( http://lucille.codeplex.com/ <http://lucille.codeplex.com/%20>).
My primary reason to do this port was to work on the internals of the indexing and searching engines for both better performance on .NET as well as deal with some long standing bugs that were preventing me from using Lucene.Net in high stress applications. For what you folks on Lucene.Net are doing, there may be some code in what I've released that will help you get things moving forward with your efforts to get to 3.0.x. Feel free to take what you need, I've keep the ASF license in place (I'm not a lawyer...). I'd be surprised if a new start-over port of Lucene 3.0.x to C# were anything but a big project. When I did this port I used JLCA and quickly discovered that there was ton of line by line review, modification and testing. During that work I decided to switch to generics, while maintaining the internals as close to Java Lucene. I tried to predict how Java Lucene would appear when 3.x was released and the Java folks began their conversion to Java 1.5 (i.e. Java generics) and therefore my API is most certainly off from Lucene. There are also some chunks of code that need to be rewritten for .NET, specifically using WeakReferences for caching, using Sharpen isn't likely to make that go away. I stopped tracking Lucene around Sept 2009; just before their switch to Java 1.5. The main reason I stopped was that I had achieved my objectives and had what I wanted out of the code. I'd like to address the inevitable question about why I didn't just contribute this to Lucene.Net. The primary reason is that I wasn't really on board with the constraints of Lucene.Net, specifically, my interest wasn't in maintaining compatibility with Lucene. I needed to completely replace the analysis and parsing code with my own. That said, a lot of the code in this port looks like what one would want from a Lucene to C# port, you'll see lots of places where I'm using generics and properties and changing things to make the entire release more .NET friendly. There's still a long way to go with that -- but this is one area I'm committed to make even better. Future work. I'm going to start reviewing the Lucene checkins from Sept 2009 and see what I need to bring this to a semi-official 3.0 release. The API remains pretty close to Lucene.Net and Java Lucene; but it's not really 1-to-1 and I don't plan to make it so. My #1 goal when I started this was to have the fastest most reliable search engine I could get -- I still think that's worth pursuing. Thanks, Scott Some details that you should know: -- This is current as of Java Lucene 803339 (SVN). This was fairly close the to the Java Lucene 3.0 release -- one reason I stopped here was I really didn't like that was happening with Attributes in Java Lucene. -- This is a VS2010 solution and builds under .NET 3.5. I have built it under 4.0, but I have a couple of issues that weren't obvious how to correct. -- This will *not* compile on .NET 2. I use generics whenever I can, and LINQ whenever it's appropriate. When I'm adding new code it's .NET 3.5 (this includes lots of var's). -- I'm sure it would be possible to create a VS2005 solution. If someone wants to create one, upload a patch and I'll include it. -- Unit tests require NUnit and for the most part they pass cleanly, except for a few that exercise files and some of the subtle problems with threading that exist in Java vs. .NET There are a few unit tests that don't consistently fail, they are related to threading and file access. At some point I need to sit down and dig into those problems -- but the writer/readers in this are pretty complicated. Unfortunately, too many of the unit tests are really integration tests and thus the entire suite takes upwards of 6 minutes to finish. -- The release includes none of the contrib code that exists in Java or Lucene.Net. -- I can't release some of the higher performance internals I have because of they belong to someone else. This has taken me a few days to excise from what I'm releasing. -- I have pretty large list of TODOs. There is a bunch of code in place that I plan to remove, it exists only the help out when porting new work from Java Lucene. -- Name visibility is all over the place; lots of things are "public" that probably shouldn't be. Some of the mechanical tools I used didn't convert these correctly, at some point one should go through and start hidding things that should be hidden. -- There are lots of port artifacts (Java "byte" vs C# byte/sbyte) left over that should be corrected. -- Some code names I've left in place even though I really want to change them, specifically "Directory", "Attribute" and "Document". -- All of the public comments have been stripped out of Java to C# conversion.