Hi Rida, I've been talking with Jukka Zitting (involved in Nutch) about parsing/Tika and we started to sketch out some project objectives on the Wiki over there which may be of interest: http://code.google.com/p/tika/w/list
I recently did a round-up of the main open source projects which maintain their own custom document parsing framework and counted over 17. There was a fair mix of approaches and parser choices but a lot of commonality suggesting a common project is possible/useful. The above WIKI sketchings were an attempt to outline the requirements for such a common project and also were questioning where best to host this. >>Tika is a really good projet and I'm really interested to join it. I suspect one of the main differences between Lius and Tika's current objectives is that Tika aims to be independent of any application which consumes the parsed data (e.g. not tied to Lucene indexing classes). That said, I don't imagine it is too hard to decouple Lius's parser logic from it's indexing logic. Cheers, Mark ----- Original Message ---- From: Rida Benjelloun <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Wednesday, 28 February, 2007 4:46:36 PM Subject: Re: Lius into apache incubator Hi Otis, Many thanks for your comments, I'm so sorry for this late answer. I will add lius as lucene contrib and I will change the licence to ASL. There are some developper contributing to Lius but there are not very active. For the question : this is a Laval University project, right? But you work at DocuLibre? I have develpped lius during my study at laval university, I still the copy right owner for this projet, so I can change the licence to ASL without any problem. Lius has been used in serveral projet at Laval university and I deceded to hoste it in Laval. I work at Laval and at Doculibre. Tika is a really good projet and I'm really interested to join it. Regards. On 1/31/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > > Hi Rida, > > Some comments in no particular order: > > - Looks useful > > - This looks like a more expanded version of what Erik and I wrote for > LIA, and I know people often ask and use that code, so I know there is a > need for a framework that knows how to parse various document formats > > - Nutch has some of the document parsing code written in form of > plugins. A few people wanted to decouple that from Nutch in a Tika project: > http://code.google.com/p/tika/ . Not sure what the status is, I think > only Jukka Zitting did any work there, but I think the initial idea was > never fully funished. If LIUS joins Lucene, I think some of this > duplication should be cleaned up, so we have only one framework for parsing > various types of document formats. > > - Going through the Incubator is one way to go. Perhaps another way to > get LIUS under Lucene is to just place it under contrib/, say contrib/lius. > > - Licensing would have to change to ASL and you would probably also have > to send in your ASF CLA. > > - Any dependencies on GPL or LGPL or code released under other licenses > would have to either be removed, or you'd have to fetch the required Jars at > compile/build time. A few projects under Lucene contrib/ already do that, I > believe > > - Are there developers who are actively working on LIUS? Fixing bugs, > adding features, keeping up with new versions of dependencies, etc. > > Otis > P.S. > Out of curiosity - this is a Laval University project, right? But you > work at DocuLibre? > > ----- Original Message ---- > From: Rida Benjelloun <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org; java-dev@lucene.apache.org > Sent: Tuesday, January 30, 2007 7:27:28 PM > Subject: Lius into apache incubator > > Hi, > I would like to add Lius framework (http://sourceforge.net/projects/lius/) > to apache incubator. Is there some volontiers to do this job and to > contribute to the developement of this project. > > Thanks. > > Rida Benjelloun. > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > ___________________________________________________________ New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at the Yahoo! Mail Championships. Plus: play games and win prizes. http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]