Is the Droids lab at all related to that parsing project in Nutch?
There seems to be several efforts that are related here that could
probably make for a nice new project under Lucene, IMO. They all
seem to have to do with getting and preparing text for processing by
some type of consumer of text.
I sometimes wonder if the Analysis stuff in Lucene proper would
benefit from moving out of core too, but I'm not sure what it would
look like just yet and it is nice having it "optimized" for Lucene
versus having to support other types of analysis phases.
Just my two cents,
Grant
On Mar 1, 2007, at 11:42 AM, Jukka Zitting wrote:
Hi,
On 3/1/07, Rida Benjelloun <[EMAIL PROTECTED]> wrote:
Lius could be used as a starting point of Tika project, if Tika
committers
are interested on it. We can also as mark said decouple Lius's
parser logic
from it's indexing logic.
I'm very interested in doing that. Another very useful codebase, among
others, would be the existing parser framework in the Nutch project.
Taking the project into Apache incubator could be also
interesting, to get
more people involved on it.
Exactly. I'd like to avoid starting just yet another codebase, and
focus more on bringing the best parts (both code and ideas) of the
existing projects together. The community-building focus of the
Incubator would likely help with that. Another aspect that would
benefit from the Incubator scrutiny are the legal implications of
pulling together multiple document parser libraries under various
different licenses.
Would there be interest within the Lucene PMC in sponsoring a proposal
along such lines? I can volunteer to put together the proposal and act
as the champion and mentor of the project.
BR,
Jukka Zitting
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org
Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]