All, I'm entering this conversation late as well. I'll apologize in advance, as I know this will be lengthy.
Briefly, I'll list my "credentials" and reasons for concern here: - I've been using Lucene.Net for many years since the early versions and have built significant products for my company using it. Those products are a core source of our revenue, which is measured in the millions of $$s. The success of my company's products are directly dependent on the success of the Lucene.Net project. - I run software development at my company and make the final decisions about what we do and how we use our resources. The developers here work on open source code on our clock. I would like to have them start doing this for Lucene.Net. We have very smart and productive people who could be a huge asset to this project. I hope that the opportunity to leverage my company's team will not be bypassed by the people running this project. - I have hacked extensively on the Lucene.Net internals to improve performance in our product and have been manually maintaining our local branch, merging in changes from the main project. I feel I have enough knowledge of both the CS theory behind search engines and in particular this codebase to not be intimidated by any aspect of the needs of this project. - I started a similar kind of open source project in that it is a .Net implementation of an existing C++ open source project and struggled with the "syntactic port" vs "conceptual port" issue, and so have perspective to provide on that discussion Relationship To ASF and Lucene ----------------------------------------------- I'd like to address one thing upfront: This should definitely remain an Apache Software Foundation project. As Grant and George have stated clearly and accurately, this is a huge benefit for this project in terms of it's credibility. This is not just because the name is well respected. It's because of WHY the Apache name is so well respected: the processes and values of the Foundation set excellent standards which encourages excellent code. This is not just my opinion, but can be objectively proven by the enormous success of the Apache projects. Complying with ASF's standards may be difficult, but it's extremely valuable. I feel that Grant's recommendation of attempting to become a TLP at Apache is the wrong direction. This should remain part of the Lucene project. It is not unique in any substantial way from Lucene and thus doesn't warrant being separate. Also, there was some mention of Lucene's file format and maintaining that compatibility. This is essential. If this ever changes, Lucene.Net will be useless. Being cross platform and having a very stable on disk format is one of it's most compelling aspects. Microsoft's Interest and Involvement --------------------------------------------------- Another thing to mention: Phil Haack and Scott Hanselman, while both are Microsoft employees, are more than just a representative of the company they work for. They are both outstanding advocates of open source software and have been instrumental in the change of attitude that Microsoft has shown in recent years towards this community. The fact that they have shown interest in this issue doesn't mean Microsoft is interested, it means that this is a significant issue for the .Net open source community. The fact they they work for Microsoft means that they may be able to leverage resources and wield clout from that vantage point that can benefit our community greatly. Regarding the question "What can Microsoft do to help"?.... I'll take a somewhat radical stance here. We need Visual J# not to have been abandoned... We need IronJava, like IronPython or IronRuby. We need a native, MS developed and supported, fully optimized and performant compiler for plain old Java code that runs on the .Net runtime and exposes Java libraries to other .Net languages like F#, C#, VB, etc.. There is a huge wealth of open source Java code out there, much of it in the Apache project archives, which would all be "ported" at once. Currently our community only gets access to Lucene.Net and iTextSharp and a few other libraries where dedicated people like George put in hard hours of direct syntax porting to implement these things in C#. We need more than that. I need Hadoop to run in .Net and HDFS, Hbase, Solr, Nutch, Tika, and everything else in that ecosystem. My company is actually at a critical point now, where we are considering abandoning .Net/WCF as our service layer platform, and switching to Java, so that we can leverage those excellent Java projects. Our business needs demand that we have what Hadoop does. It will be easier for me to migrate my application code to Java than to attempt to find equivalent functionality in the existing .Net world or write my own framework, or port Hadoop. So, if there was ONE thing that Microsoft could do to *significantly* help the .Net developer community, it would be providing a *real* implementation of IronJava which would obviate the need to port code completely, and simply allow those libraries and applications to run in .Net natively. That said, assuming that Visual J# remains "retired" (see: http://msdn.microsoft.com/en-us/vjsharp/default ) this project is one of the few things we .Net developers have to work with. Java or .Net Code Idioms ------------------------------------- I agree that moving to a codebase that is more .Net idiomatic will both improve the user experience of end users of Lucene.Net but will also improve the level of involvement that we can get from the community. To put it simply, right now, hacking on the Lucene.Net core code means you must understand Java idioms well, and how to translate those to .Net. This is a skill set which is somewhat uncommon. The "direct port" methodology also leads to code that is not fully optimized for .Net. I have changed our local branch in a number of significant ways, and improved performance significantly by doing so. I didn't change APIs, I just change the implementations to be more appropriate for .Net, and included generics. The test suite provided with Lucene/Lucene.Net is a great benefit in that regard, and helped me ensure that my changes didn't break functionality. That said, the project need to improve in this regard. The classes themselves need to be implemented in a more "testable" manner. Abstract base classes instead of interfaces makes the code less mockable and thus less testable. It also makes it harder to implement customized components into the system. There are a number of things that are sealed or internal that do not need to be. Lucene (for Java) was awesome because it ran well as managed code and was elegant and efficient in Java's environment. Any port of Lucene should *retain those features* as well. The library should make sense and be implemented in the most elegant and efficient way that it can be on the platform it's implemented on. Lucene.Net should not be a port of Java Lucene to .Net, it should be an *implementation* of Lucene running in .Net. Porting implies line-for-line similarity. Implementing just implies that the features are all represented. For that reason, I support moving to a more idiomatic .Net implementation, verified by the unit tests. The argument that "it will require smart people" to understand the core code -- that's a *GOOD* requirement. If you don't understand how it works, conceptually, perhaps you should not be attempting to implementing it. Merely porting or auto-converting code that "seems to be the same" and "passes the unit tests", without really understanding the details is not a safe way to ensure correct operation. What if there was a subtle difference between the two syntaxes which led to differing (ie incorrect) behaviour in some scenarios? What if the unit tests didn't cover that scenario? Regarding the help and support provided by the Lucene community, and the books and examples that provide code samples.. Changing to a more .Net idiomatic codebase, even if that meant top level API changes, would not be a substantial issue that would prevent a .Net developer from understanding example code written in Java. If the API is *basically* the same, but uses foo.Size instead of foo.getSize()/foo.setSize() or List<T> instead of ArrayList... those differences are minor and will not cause significant issues for groking cross-language examples. People will still get it... and .Net developers will be much happier. So, take away is: - My team and I will help hack on Lucene.Net and get paid to do it - Lucene.Net should not change project status - Microsoft should implement IronJava - Moving towards idiomatic .Net code is the direction the project should go and is not that big of a deal Also, as a side-note. We're hiring in the Portland, Oregon area, and could use developers who know Lucene.Net, and want to hack on it on the clock. Send me your resume. Thanks, Troy Howard Director of Software Development | discover-e Legal, LLC | thowar...@gmail.com