Doug: I agree with all of your comment except the following..
Third, part of the problem seems like there are two few contributors--that the challenges are big and the resources limited. Splitting the project will only spread those resources more thinly.
IMHO, there are lot of duplicated effort (i.e off and on the FOSS domain). Crawling, file parsing, analyzers, incremental indexing etc. are a common discussion topic on every Lucene mailing list. Which makes resources spread across many duplicated effort instead of having a common High-level agreed API. Instead of branching/creating new project it is more efficient to develop libs (i.e Nutch crawler, analyzer etc..) so that other projects (on or off FOSS domain) can re-use them i.e code base sharing should be easy Not difficult. Exactly the same reason NDFS became Hadoop. Now anyone can read the Hadoop API and combine it with Lucene without much trouble to run Lucene Index engine on top of Hadoop. A crawler or analyzer can be re-used in the same manner as above. Same goes for Indexing or searching .. As you pointed out previously ... http://www.gossamer-threads.com/lists/lucene/general/41211 Again not really proposing a new project but more easy to use re-usable code. IMHO, Nutch will be an umbrella project for "ala-Google" and Solr will be for "ala-Enterpise" where Lucene is the index lib, Hadoop is the Mapred/DFS lib ..what is missing is Common Crawler lib, Common indexing lib etc.. Regards